Binexport

BinExport is an utility hosted on Google’s github to export the disassembly obtained from IDA Pro, Binary Ninja and Ghidra into external file. The file is defined in Protobuf format. It is particularly used by BinDiff as input for its diffing.

For more details: Project homepage

Limitations: Besides, working like a charm, BinExport is essentially lacking the two following features:

  • Regardless of the disassembler (IDA, Ghidra ..), binexport has to be launched manually through GUI or via a shell command if the disassembler supports headless mode. There are no programmatic bindings to trigger an export.

  • Exported data are very useful, but binexport was defined as an internal format, there are no API to manipulate exported data.

As such, python-binexport has been developed to fulfill these two weaknesses.

Python-Binexport

python-binexport is a python module aiming to give a friendly interface to load and manipulate binexport files.

What is binexport ?

Binexport is a protobuf format used by Bindiff to extract IDA database and to process them outside. It gives a very optimizated (in size) representation of the program.

Dependencies

Python-binexport can load any .BinExport files generated from the supported disassemblers IDA, Ghidra and Binary Ninja.

However to perform the export with binexporter or from the API ProgramBinexport.from_binary_file() the IDA plugin must be installed as it is the only supported at the moment. For that it has to be installed first from the github page. To use the feature python-binexport requires IDA >=7.2 (as it calls the BinExportBinary IDC function).

[!WARNING] If you export files from python-binexport make sure the IDA Pro binexport plugin is properly installed and works when running it manually before trying to use it from the python library (it can hang if not properly installed).

[!NOTE] The possibility to export files using Ghidra, or Binary Ninja from python-binexport might be supported in the future.

Installation

pip install python-binexport

Python module usage

The main intended usage of python-binexport is as a python module. The main entry point is the class ProgramBinExport which triggers the loading of the whole file. Here is a snippet to iterate on every expression of every instruction in the program:

from binexport import ProgramBinExport

p = ProgramBinExport("myprogram.BinExport")
for fun_addr, fun in p.items():
    for bb_addr, bb in fun.items():
        for inst_addr, inst in bb.items():
            for operand in inst.operands:
                for exp in operand.expressions:
                    pass  # Do whatever at such deep level

Obviously ProgramBinExport, FunctionBinExport, InstructionBinExport and OperandBinExport all provides various attributes and method to get their type, and multiple other infos.

If the module idascript is installed you can directly generate a BinExport file using the Program.from_binary_file static method.

Command line usage

The executable script binexporter provides a very basic utility to export a BinExport file straight from the command line (without having to launch IDA etc..). This is basically a wrapper for Program.from_binary_file.

API

Program

class binexport.program.ProgramBinExport(file: Path | str)[source]

Bases: dict

Program class that wraps the binexport with high-level functions and an easy to use API. It inherits from a dict which is used to reference all functions based on their address.

Parameters:

file – BinExport file path

property architecture: str

Returns the architecture suffixed with address size ex: x86_64, x86_32

callgraph: DiGraph

program callgraph (as Digraph)

static from_binary_file(exec_file: Path | str, output_file: str | Path = '', open_export: bool = True, override: bool = False) ProgramBinExport | bool[source]

Generate the .BinExport file for the given program and return an instance of ProgramBinExport.

Warning

That function requires the module idascript

Parameters:
  • exec_file – executable file path

  • output_file – BinExport output file

  • open_export – whether or not to open the binexport after export

  • override – Override the .BinExport if already existing. (default false)

Returns:

an instance of ProgramBinExport if open_export is true, else boolean on whether it succeeded

fun_names: Dict[str, FunctionBinExport]

dictionary function name -> name

property name: str

Return the name of the program (as exported by binexport)

property proto: BinExport2

Returns the protobuf object associated to the program

Function

class binexport.function.FunctionBinExport(program: ReferenceType[ProgramBinExport], *, pb_fun: BinExport2.FlowGraph | None = None, is_import: bool = False, addr: int | None = None)[source]

Function object. Also references its parents and children (function it calls).

Constructor. Iterates the FlowGraph structure and initialize all the basic blocks and instruction accordingly.

Parameters:
  • program – weak reference to program (used to navigate pb fields contained inside)

  • pb_fun – FlowGraph protobuf structure

  • is_import – whether or not it’s an import function (if so does not initialize bb etc..)

  • addr – address of the function (info avalaible in the call graph)

addr: int | None

address, None if imported function

property blocks: Dict[int, BasicBlockBinExport]

Returns a dict which is used to reference all basic blocks by their address. Calling this function will also load the CFG. The dict is by default cached, to erase the cache delete the attribute.

Returns:

dictionary of addresses to basic blocks

children: Set[FunctionBinExport]

set of functions called by this one

property graph: DiGraph

The networkx CFG associated to the function.

is_import() bool[source]

Returns whether or not the function is an import

items() ItemsView[int, BasicBlockBinExport][source]

Each function is associated to a dictionary with key-value Addr->BasicBlockBinExport. This returns items of the dictionary.

keys() KeysView[int][source]

Each function is associated to a dictionary with key-value : Addr, BasicBlockBinExport. This returns items of the dictionary

property name: str

Name of the function if it exists otherwise like IDA with sub_XXX

parents: Set[FunctionBinExport]

set of function call this one

property program: ProgramBinExport

ProgramBinExport in which this function belongs to.

property type: FunctionType

Type of the function as a FunctionType

Returns:

type enum of the function

property uncached_blocks: dict[int, BasicBlockBinExport]

Returns a dict which is used to reference all basic blocks by their address. Calling this function will also load the CFG. The object returned is not cached, calling this function multiple times will create the same object multiple times. If you want to cache the object you should use FunctionBinExport.blocks.

Returns:

dictionary of addresses to basic blocks

values() ValuesView[BasicBlockBinExport][source]

Each function is associated to a dictionary with key-value : Addr, BasicBlockBinExport. This returns items of the dictionary.

Basic Block

class binexport.function.BasicBlockBinExport(program: ReferenceType[ProgramBinExport], function: ReferenceType[FunctionBinExport], pb_bb: BinExport2.BasicBlock)[source]

Bases: object

Basic block class.

Parameters:
  • program – Weak reference to the program

  • function – Weak reference to the function

  • pb_bb – protobuf definition of the basic block

addr: int

basic bloc address

bytes

bytes of the basic block

property function: FunctionBinExport

Wrapper on weak reference on FunctionBinExport

Returns:

object FunctionBinExport, function associated to the basic block

property instructions: dict[int, InstructionBinExport]

Returns a dict which is used to reference all the instructions in this basic block by their address. The object returned is by default cached, to erase the cache delete the attribute.

Returns:

dictionary of addresses to instructions

property program: ProgramBinExport

Wrapper on weak reference on ProgramBinExport

Returns:

object ProgramBinExport, program associated to the basic block

property uncached_instructions: dict[int, InstructionBinExport]

Returns a dict which is used to reference all the instructions in this basic block by their address. The object returned is not cached, calling this function multiple times will create the same object multiple times. If you want to cache the object you should use BasicBlockBinExport.instructions.

Returns:

dictionary of addresses to instructions

Instruction

class binexport.instruction.InstructionBinExport(program: ReferenceType[ProgramBinExport], function: ReferenceType[FunctionBinExport], addr: int, i_idx: int)[source]

Instruction class. It represents an instruction with its operands.

Parameters:
  • program – Weak reference to the program

  • function – Weak reference to the function

  • addr – address of the instruction (computed outside)

  • i_idx – instruction index in the protobuf data structure

addr: int

instruction address

bytes

bytes of the instruction (opcodes)

data_refs: Set[int]

Data references address

property mnemonic: str

Mnemonic string as gathered by binexport (with prefix).

property operands: List[OperandBinExport]

Returns a list of the operands instanciated dynamically on-demand. The list is cached by default, to erase the cache delete the attribute.

property pb_instr: BinExport2.Instruction

Protobuf instruction object.

property program: ProgramBinExport

Program associated with this instruction.

property uncached_operands: list[OperandBinExport]

Returns a list of the operands instanciated dynamically on-demand. The object returned is not cached, calling this function multiple times will create the same object multiple times. If you want to cache the object you should use InstructionBinExport.operands.

Operand

class binexport.operand.OperandBinExport(program: ReferenceType[ProgramBinExport], function: ReferenceType[FunctionBinExport], instruction: ReferenceType[InstructionBinExport], op_idx: int)[source]

Operand object. Provide access to the underlying expression.

Parameters:
  • program – Weak reference to the program

  • function – Weak reference to the function

  • instruction – Weak reference to the instruction

  • op_idx – operand index in protobuf structure

property expressions: List[ExpressionBinExport]

Iterates over all the operand expression in a pre-order manner (binary operator first). The list is cached by default, to erase the cache delete the attribute

property function: FunctionBinExport

Function object associated to this operand.

property instruction: InstructionBinExport

Instruction object associated to this operand.

property pb_operand: BinExport2.Operand

Protobuf operand object in the protobuf structure.

property program: ProgramBinExport

Program object associated to this operand.

property uncached_expressions: List[ExpressionBinExport]

Iterates over all the operand expression in a pre-order manner (binary operator first). The object returned is not cached, calling this function multiple times will create the same object multiple times. If you want to cache the object you should use OperandBinExport.expressions.

Expression

class binexport.expression.ExpressionBinExport(program: ProgramBinExport, function: FunctionBinExport, instruction: InstructionBinExport, exp_idx: int, parent: ExpressionBinExport | None = None)[source]

Class that represent an expression node in the expression tree for a specific operand. The tree is inverted (each node has an edge to its parent)

Parameters:
  • program – reference to program

  • function – reference to function

  • instruction – reference to instruction

  • exp_idx – expression index in the protobuf table

  • parent – reference to the parent expression in the tree. None if it is the root.

property depth: int

Returns the depth of the node in the tree (root is depth 0).

is_addr: bool

whether the value is referring to an address

is_data: bool

whether the value is a reference to data

parent: ExpressionBinExport | None

parent expression if nested

property type: ExpressionType

Returns the type as defined in ExpressionType of the expression, after the protobuf parsing

property value: str | int | float

Returns the value of the expression, after the protobuf parsing

Returns:

value of the expression

Types

binexport.types.Addr

An integer representing an address within a program

enum binexport.types.ExpressionType(value)[source]

Expression type derived from protobuf expression types.

Valid values are as follows:

FUNC_NAME = <ExpressionType.FUNC_NAME: 1>

function name

VAR_NAME = <ExpressionType.VAR_NAME: 2>

variable name

IMMEDIATE_INT = <ExpressionType.IMMEDIATE_INT: 3>

immediate value

IMMEDIATE_FLOAT = <ExpressionType.IMMEDIATE_FLOAT: 4>

float expression

SYMBOL = <ExpressionType.SYMBOL: 5>

symbol expression

REGISTER = <ExpressionType.REGISTER: 6>

register expression

SIZE = <ExpressionType.SIZE: 7>

size expression (byte, dword ..)

enum binexport.types.FunctionType(value)[source]

Function types as defined by IDA

Valid values are as follows:

NORMAL = <FunctionType.NORMAL: 1>

Normal function

LIBRARY = <FunctionType.LIBRARY: 2>

library function

IMPORTED = <FunctionType.IMPORTED: 3>

imported function (don’t have content)

THUNK = <FunctionType.THUNK: 4>

thunk function (trampoline to another function)

INVALID = <FunctionType.INVALID: 5>

invalid function (as computed by IDA)

The Enum and its members also have the following methods:

static from_proto(function_type: BinExport2.CallGraph.Vertex.Type) FunctionType[source]