Binexport¶
BinExport is an utility hosted on Google’s github to export the disassembly obtained from IDA Pro, Binary Ninja and Ghidra into external file. The file is defined in Protobuf format. It is particularly used by BinDiff as input for its diffing.
For more details: Project homepage
Limitations: Besides, working like a charm, BinExport is essentially lacking the two following features:
Regardless of the disassembler (IDA, Ghidra ..), binexport has to be launched manually through GUI or via a shell command if the disassembler supports headless mode. There are no programmatic bindings to trigger an export.
Exported data are very useful, but binexport was defined as an internal format, there are no API to manipulate exported data.
As such, python-binexport has been developed to fulfill these two weaknesses.
Python-Binexport¶
python-binexport
is a python module aiming to give a friendly interface to load
and manipulate binexport files.
What is binexport ?¶
Binexport is a protobuf
format used by Bindiff to extract IDA database and
to process them outside. It gives a very optimizated (in size) representation
of the program.
Dependencies¶
Python-binexport can load any .BinExport files generated from the supported disassemblers IDA, Ghidra and Binary Ninja.
However to perform the export with binexporter
or from the API ProgramBinexport.from_binary_file()
the IDA plugin must be installed as it is the only supported at the moment. For that it has to be installed first from the github page.
To use the feature python-binexport requires IDA >=7.2 (as it calls the BinExportBinary
IDC function).
[!WARNING] If you export files from python-binexport make sure the IDA Pro binexport plugin is properly installed and works when running it manually before trying to use it from the python library (it can hang if not properly installed).
[!NOTE] The possibility to export files using Ghidra, or Binary Ninja from python-binexport might be supported in the future.
Installation¶
pip install python-binexport
Python module usage¶
The main intended usage of python-binexport
is as a python module.
The main entry point is the class ProgramBinExport
which triggers the
loading of the whole file. Here is a snippet to iterate on every expression
of every instruction in the program:
from binexport import ProgramBinExport
p = ProgramBinExport("myprogram.BinExport")
for fun_addr, fun in p.items():
with fun: # Preload all the basic blocks
for bb_addr, bb in fun.items():
for inst_addr, inst in bb.instructions.items():
for operand in inst.operands:
for exp in operand.expressions:
pass # Do whatever at such deep level
Obviously ProgramBinExport
, FunctionBinExport
, InstructionBinExport
and OperandBinExport
all provides various attributes and method to get their type, and multiple other infos.
If the module
idascript
is installed you can directly generate a BinExport file using theProgram.from_binary_file
static method.
Command line usage¶
The executable script binexporter
provides a very basic utility
to export a BinExport file straight from the command line (without
having to launch IDA etc..). This is basically a wrapper for Program.from_binary_file
.
API¶
Program¶
- class binexport.program.ProgramBinExport(file: Path | str)[source]¶
Bases:
dict
Program class that wraps the binexport with high-level functions and an easy to use API. It inherits from a dict which is used to reference all functions based on their address.
- Parameters:
file – BinExport file path
- callgraph: networkx.DiGraph¶
program callgraph (as Digraph)
- static from_binary_file(exec_file: Path | str, output_file: str | Path = '', open_export: bool = True, override: bool = False) ProgramBinExport | bool [source]¶
Generate the .BinExport file for the given program and return an instance of ProgramBinExport.
Warning
That function requires the module
idascript
- Parameters:
exec_file – executable file path
output_file – BinExport output file
open_export – whether or not to open the binexport after export
override – Override the .BinExport if already existing. (default false)
- Returns:
an instance of ProgramBinExport if open_export is true, else boolean on whether it succeeded
- fun_names: dict[str, FunctionBinExport]¶
dictionary function name -> name
- path: pathlib.Path¶
Binexport file path
- property proto: BinExport2¶
Returns the protobuf object associated to the program
Function¶
- class binexport.function.FunctionBinExport(program: weakref.ref[ProgramBinExport], *, pb_fun: BinExport2.FlowGraph | None = None, is_import: bool = False, addr: Addr | None = None)[source]¶
Function object. Also references its parents and children (function it calls).
Constructor. Iterates the FlowGraph structure and initialize all the basic blocks and instruction accordingly.
- Parameters:
program – weak reference to program (used to navigate pb fields contained inside)
pb_fun – FlowGraph protobuf structure
is_import – whether or not it’s an import function (if so does not initialize bb etc..)
addr – address of the function (info avalaible in the call graph)
- property blocks: dict[Addr, BasicBlockBinExport]¶
Returns a dict which is used to reference all basic blocks by their address. Calling this function will also load the CFG. By default the object returned is not cached, calling this function multiple times will create the same object multiple times. If you want to cache the object you should use the context manager of the function or calling the function FunctionBinExport.load. Ex:
1# func: FunctionBinExport 2with func: # Loading all the basic blocks 3 for bb_addr, bb in func.blocks.items(): # Blocks are already loaded 4 pass 5 # The blocks are still loaded 6 for bb_addr, bb in func.blocks.items(): 7 pass 8# here the blocks have been unloaded
- Returns:
dictionary of addresses to basic blocks
- children: set[FunctionBinExport]¶
set of functions called by this one
- items() abc.ItemsView[Addr, BasicBlockBinExport] [source]¶
Each function is associated to a dictionary with key-value Addr->BasicBlockBinExport. This returns items of the dictionary.
- keys() abc.KeysView[Addr] [source]¶
Each function is associated to a dictionary with key-value : Addr, BasicBlockBinExport. This returns items of the dictionary
- parents: set[FunctionBinExport]¶
set of function call this one
- property program: ProgramBinExport¶
ProgramBinExport
in which this function belongs to.
- property type: FunctionType¶
Type of the function as a FunctionType
- Returns:
type enum of the function
- values() abc.ValuesView[BasicBlockBinExport] [source]¶
Each function is associated to a dictionary with key-value : Addr, BasicBlockBinExport. This returns items of the dictionary.
Basic Block¶
- class binexport.function.BasicBlockBinExport(program: weakref.ref[ProgramBinExport], function: weakref.ref[FunctionBinExport], pb_bb: BinExport2.BasicBlock)[source]¶
Bases:
object
Basic block class.
- Parameters:
program – Weak reference to the program
function – Weak reference to the function
pb_bb – protobuf definition of the basic block
- addr: Addr¶
basic bloc address
- bytes¶
bytes of the basic block
- property function: FunctionBinExport¶
Wrapper on weak reference on FunctionBinExport
- Returns:
object
FunctionBinExport
, function associated to the basic block
- property instructions: dict[Addr, InstructionBinExport]¶
Returns a dict which is used to reference all the instructions in this basic block by their address. The object returned is by default cached, to erase the cache delete the attribute.
- Returns:
dictionary of addresses to instructions
- property program: ProgramBinExport¶
Wrapper on weak reference on ProgramBinExport
- Returns:
object
ProgramBinExport
, program associated to the basic block
Instruction¶
- class binexport.instruction.InstructionBinExport(program: weakref.ref[ProgramBinExport], function: weakref.ref[FunctionBinExport], addr: Addr, i_idx: int)[source]¶
Instruction class. It represents an instruction with its operands.
- Parameters:
program – Weak reference to the program
function – Weak reference to the function
addr – address of the instruction (computed outside)
i_idx – instruction index in the protobuf data structure
- addr: Addr¶
instruction address
- bytes¶
bytes of the instruction (opcodes)
- property operands: list[OperandBinExport]¶
Returns a list of the operands instanciated dynamically on-demand. The list is cached by default, to erase the cache delete the attribute.
- Returns:
list of operands
- property pb_instr: BinExport2.Instruction¶
Protobuf instruction object.
- property program: ProgramBinExport¶
Program associated with this instruction.
Operand¶
- class binexport.operand.OperandBinExport(program: weakref.ref[ProgramBinExport], function: weakref.ref[FunctionBinExport], instruction: weakref.ref[InstructionBinExport], op_idx: int)[source]¶
Operand object. Provide access to the underlying expression.
- Parameters:
program – Weak reference to the program
function – Weak reference to the function
instruction – Weak reference to the instruction
op_idx – operand index in protobuf structure
- property expressions: list[ExpressionBinExport]¶
Iterates over all the operand expression in a pre-order manner (binary operator first). The list is cached by default, to erase the cache delete the attribute
- Returns:
list of expressions
- property function: FunctionBinExport¶
Function object associated to this operand.
- property instruction: InstructionBinExport¶
Instruction object associated to this operand.
- property pb_operand: BinExport2.Operand¶
Protobuf operand object in the protobuf structure.
- property program: ProgramBinExport¶
Program object associated to this operand.
Expression¶
- class binexport.expression.ExpressionBinExport(program: ProgramBinExport, function: FunctionBinExport, instruction: InstructionBinExport, exp_idx: int, parent: ExpressionBinExport | None = None)[source]¶
Class that represent an expression node in the expression tree for a specific operand. The tree is inverted (each node has an edge to its parent)
- Parameters:
program – reference to program
function – reference to function
instruction – reference to instruction
exp_idx – expression index in the protobuf table
parent – reference to the parent expression in the tree. None if it is the root.
- parent: ExpressionBinExport | None¶
parent expression if nested
- property type: ExpressionType¶
Returns the type as defined in ExpressionType of the expression, after the protobuf parsing
Types¶
- binexport.types.Addr¶
An integer representing an address within a program
- enum binexport.types.ExpressionType(value)[source]¶
Expression type derived from protobuf expression types.
Valid values are as follows:
- FUNC_NAME = <ExpressionType.FUNC_NAME: 1>¶
function name
- VAR_NAME = <ExpressionType.VAR_NAME: 2>¶
variable name
- IMMEDIATE_INT = <ExpressionType.IMMEDIATE_INT: 3>¶
immediate value
- IMMEDIATE_FLOAT = <ExpressionType.IMMEDIATE_FLOAT: 4>¶
float expression
- SYMBOL = <ExpressionType.SYMBOL: 5>¶
symbol expression
- REGISTER = <ExpressionType.REGISTER: 6>¶
register expression
- SIZE = <ExpressionType.SIZE: 7>¶
size expression (byte, dword ..)
- enum binexport.types.FunctionType(value)[source]¶
Function types as defined by IDA
Valid values are as follows:
- NORMAL = <FunctionType.NORMAL: 1>¶
Normal function
- LIBRARY = <FunctionType.LIBRARY: 2>¶
library function
- IMPORTED = <FunctionType.IMPORTED: 3>¶
imported function (don’t have content)
- THUNK = <FunctionType.THUNK: 4>¶
thunk function (trampoline to another function)
- INVALID = <FunctionType.INVALID: 5>¶
invalid function (as computed by IDA)
The
Enum
and its members also have the following methods:- static from_proto(function_type: BinExport2.CallGraph.Vertex.Type) FunctionType [source]¶