Bindiff¶
BinDiff is one of the oldest and most widely used differ in the reverse engineering community. First developed at Zynamics, it was then acquired by Google. This differ is based on properties of CallGraph to establish matches between functions of two binaries. For more details about BinDiff heuristics, have a look at one the first paper or the documentation.
Limitations: Bindiff has primarily been designed to be used using the main Java GUI application. A diff can be triggered using using command line, but no API has been implemented to trigger a diff programatically. Similarly, no API enables manipulating the diff result itself.
As such, python-bindiff has been developed to provide a Python API enabling triggering a diff and manipulating its content.
Python Bindiff¶
python-bindiff
is a python module aiming to give a friendly interface to launch
and manipulate bindiff between two binary iles.
How it works ?¶
The module relies on python-binexport
to extract programs .BinExport and then directly interact with the binary differ
(of zynamics) to perform the diff. The generated diff file is then correlated
with the two binaries to be able to navigate the changes.
Installation¶
The python module requires Bindiff. Thus first refers to Zynamics installation directives.
Then the python module can be installed with:
pip install python-bindiff
The python module needs to execute the differ
executable. As such it should be available:
either in the path
or via the
BINDIFF_PATH
environment variable
Usage as a python module¶
The simplest way to get the programs diffed is:
from bindiff import BinDiff
diff = BinDiff("sample1.BinExport", "sample2.BinExport", "diff.BinDiff")
print(diff.similarity, diff.confidence)
# do whatever you want with diff.primary, diff.secondary which are the
# two Program object
But programs can be instanciated separately:
from binexport import ProgramBinExport
from bindiff import BinDiff
p1 = ProgramBinExport("sample1.BinExport")
p2 = ProgramBinExport("sample2.BinExport")
diff = BinDiff(p1, p2, "diff.BinDiff")
Note that all the diff data are embedded inside program objects thus
after instanciating BinDiff those p1
and p2
are modified.
From the API it is also possible to directly perform the BinExport extraction and the diffing:
from bindiff import BinDiff
diff = BinDiff.from_binary_files("sample1.exe", "sample2.exe", "out.BinDiff")
# or performing the diff on BinExport files
diff = BinDiff.from_binexport_files("sample1.BinExport", "sample2.BinExport", "out.BinDiff")
Usage as a command line¶
The bindiffer
command line allows to generate a diff file from the two
.BinExport files or directly from the binaries (thanks to python-binexport and
idascript). The help message is the following:
Usage: bindiffer [OPTIONS] <primary file> <secondary file>
bindiffer is a very simple utility to diff two binary files using BinDiff in command line. The two input files can be either binary files (in which case
IDA is used) or directly .BinExport file (solely BinDiff is used).
Options:
-i, --ida-path PATH IDA Pro installation directory
-b, --bindiff-path PATH BinDiff differ directory
-t, --type <type> inputs files type ('bin', 'binexport') [default:'bin']
-o, --output PATH Output file matching
-h, --help Show this message and exit.
To work bindiff differ
binary should be in the $PATH
, given via
the BINDIFF_PATH
environment variable or with the -b
command option.
Similarly when diff binaries directly the ida64 binary should be available
in the $PATH, given with the IDA_PATH
environment variable or via the
-i
command option.
API¶
BinDiff¶
- class bindiff.BinDiff(primary: ProgramBinExport | str, secondary: ProgramBinExport | str, diff_file: str)[source]¶
BinDiff class. Parse the diffing result of Bindiff and apply it to the two ProgramBinExport given. All the diff result is embedded in the two programs object so after loading the class can be dropped if needed.
Warning
the two programs given are mutated into
ProgramBinDiff
object which inheritSimilarityMixin
andDictMatchMixin
which provides additional attributes and method to the class.- Parameters:
primary – first program diffed
secondary – second program diffed
diff_file – diffing file as generated by bindiff (differ more specifically)
- static assert_installation_ok() None [source]¶
Assert BinDiff is installed
- Raises:
BindiffNotFound – if the bindiff binary cannot be found
- static from_binary_files(p1_path: str, p2_path: str, diff_out: str, override: bool = False) BinDiff | None [source]¶
Diff two executable files. Thus it export .BinExport files from IDA and then diff the two resulting files in BinDiff.
- Parameters:
p1_path – primary binary file to diff
p2_path – secondary binary file to diff
diff_out – output file for the diff
override – override Binexports files and diffing
- Returns:
BinDiff object representing the diff
- static from_binexport_files(p1_binexport: ProgramBinExport | str, p2_binexport: ProgramBinExport | str, diff_out: str, override: bool = False) BinDiff | None [source]¶
Diff two binexport files. Diff the two binexport files with bindiff and then load a BinDiff instance.
- Parameters:
p1_binexport – primary binexport file to diff (path or object)
p2_binexport – secondary binexport file to diff (path or object)
diff_out – output file for the diff
override – override Binexports files and diffing
- Returns:
BinDiff object representing the diff
- get_match(function: FunctionBinExport) tuple[FunctionBinExport, FunctionMatch] | None [source]¶
Get the function that matches the provided one.
- Parameters:
function – A function that belongs either to primary or secondary
- Returns:
A tuple with the matched function and the match object if there is a match for the provided function, otherwise None
- static is_installation_ok() bool [source]¶
Check that bindiff is properly installed and can be found on the system.
- Returns:
true if the bindiff binary can be found.
- is_matched(function: FunctionBinExport) bool [source]¶
- Parameters:
function – A function that belongs either to primary or secondary.
- Returns:
True if there is a match for the provided function, False otherwise
- iter_basicblock_matches(function1: FunctionBinExport, function2: FunctionBinExport) list[tuple[BasicBlockBinExport, BasicBlockBinExport, BasicBlockMatch]] [source]¶
Return a list of all the matched basic blocks between the two provided functions. Each element of the list is a tuple containing the basic blocks of the primary and secondary functions and the BasicBlockMatch object describing the match. The first function must be part of the primary program while the second function must be part of the secondary program.
- Parameters:
function1 – A function of the primary program
function2 – A function of the secondary program
- Returns:
list of tuple, each containing the primary basic block, the secondary basic block and the BasicBlockMatch object
- iter_function_matches() list[tuple[FunctionBinExport, FunctionBinExport, FunctionMatch]] [source]¶
Return a list of all the matched functions. Each element of the list is a tuple containing the function in the primary program, the matched function in the secondary program and the FunctionMatch object describing the match
- Returns:
list of tuple, each containing the primary function, the secondary function and the FunctionMatch object
- iter_instruction_matches(block1: BasicBlockBinExport, block2: BasicBlockBinExport) list[tuple[InstructionBinExport, InstructionBinExport]] [source]¶
Return a list of all the matched instructions between the two provided basic blocks. Each element of the list is a tuple containing the instructions of the primary and secondary basic blocks. The first basic block must belong to the primary program while the second one must be part of the secondary program.
- Parameters:
block1 – A basic block belonging to the primary program
block2 – A basic block belonging to the secondary program
- Returns:
list of tuple, each containing the primary instruction and the secondary instruction
- primary¶
Primary BinExport object
- primary_unmatched_basic_block(function: FunctionBinExport) list[BasicBlockBinExport] [source]¶
Return a list of the unmatched basic blocks in the provided function. The function must be part of the primary program.
- Parameters:
function – A function of the primary program
- Returns:
list of unmatched basic blocks
- primary_unmatched_function() list[FunctionBinExport] [source]¶
Return a list of the unmatched functions in the primary program.
- Returns:
list of unmatched functions in primary
- primary_unmatched_instruction(bb: BasicBlockBinExport) list[InstructionBinExport] [source]¶
Return a list of the unmatched instructions in the provided basic block. The basic block must be part of the primary program.
- Parameters:
bb – A basic block belonging to the primary program
- Returns:
list of unmatched instructions
- static raw_diffing(p1_path: Path | str, p2_path: Path | str, out_diff: str) bool [source]¶
Static method to diff two binexport files against each other and storing the diffing result in the given file
- Parameters:
p1_path – primary file path
p2_path – secondary file path
out_diff – diffing output file
- Returns:
True if successful, False otherwise
- secondary¶
Secondary BinExport object
- secondary_unmatched_basic_block(function: FunctionBinExport) list[BasicBlockBinExport] [source]¶
Return a list of the unmatched basic blocks in the provided function. The function must be part of the secondary program.
- Parameters:
function – A function of the secondary program
- Returns:
list of unmatched basic blocks
- secondary_unmatched_function() list[FunctionBinExport] [source]¶
Return a list of the unmatched functions in the secondary program.
- Returns:
list of unmatched functions in secondary
- secondary_unmatched_instruction(bb: BasicBlockBinExport) list[InstructionBinExport] [source]¶
Return a list of the unmatched instructions in the provided basic block. The basic block must be part of the secondary program.
- Parameters:
bb – A basic block belonging to the secondary program
- Returns:
list of unmatched instructions
BinDiff File¶
- class bindiff.file.BindiffFile(file: Path | str, permission: str = 'ro')[source]¶
Bindiff database file. The class seemlessly parse the database and allowing retrieving and manipulating the results.
It also provides some methods to create a database and to add entries in the database.
- Parameters:
file – path to Bindiff database
permission – permission to use for opening database (default: ro)
- add_basic_block_match(fun_addr1: int, fun_addr2: int, bb_addr1: int, bb_addr2: int) int [source]¶
Add a basic block match in database.
- Parameters:
fun_addr1 – function address of basic block in primary
fun_addr2 – function address of basic block in secondary
bb_addr1 – basic block address in primary
bb_addr2 – basic block address in secondary
- Returns:
id of the row inserted in database.
- add_function_match(fun_addr1: int, fun_addr2: int, fun_name1: str, fun_name2: str, similarity: float, confidence: float = 0.0, identical_bbs: int = 0) int [source]¶
Add a function match in database.
- Parameters:
fun_addr1 – primary function address
fun_addr2 – secondary function address
fun_name1 – primary function name
fun_name2 – secondary function name
similarity – similarity score between the two functions
confidence – confidence score between the two functions
identical_bbs – number of identical basic blocks
- Returns:
id of the row inserted in database.
- add_instruction_match(entry: int, inst_addr1: int, inst_addr2: int) None [source]¶
Add an instruction match in database.
- Parameters:
entry – basic block match identifier in database
inst_addr1 – instruction address in primary
inst_addr2 – instruction address in secondary
- property basicblock_matches: list[BasicBlockMatch]¶
Returns the list of matched basic blocks in primary (and secondary)
- static create(filename: str, primary: str, secondary: str, version: str, desc: str, similarity: float, confidence: float) BindiffFile [source]¶
Create a new Bindiff database object in the file given in filename. It only takes two binaries.
- Parameters:
filename – database file path
primary – path to primary binary
secondary – path to secondary binary
version – version of the differ used
desc – description of the database
similarity – similarity score between to two binaries
confidence – confidence of results
- Returns:
instance of BindiffFile (ready to be filled)
- property function_matches: list[FunctionMatch]¶
Returns the list of matched functions
- static init_database(db: Connection) None [source]¶
Initialize the database by creating all the tables
- primary_functions_match: dict[int, FunctionMatch]¶
FunctionMatch indexed by addresses in primary
- secondary_functions_match: dict[int, FunctionMatch]¶
FunctionMatch indexed by addresses in secondary
- property unmatched_primary_count: int¶
Returns the number of functions inside primary that are not matched
- property unmatched_secondary_count: int¶
Returns the number of functions inside secondary that are not matched
- update_file_infos(entry_id: int, fun_count: int, lib_count: int, bb_count: int, inst_count: int) None [source]¶
Update information about a binary in database (function, basic block count …)
- Parameters:
entry_id – entry of the binary in database (row id)
fun_count – number of functions
lib_count – number of functions flagged as libraries
bb_count – number of basic blocks
inst_count – number of instructions
Types¶
- class bindiff.file.File(id: int, filename: str, exefilename: str, hash: str, functions: int, libfunctions: int, calls: int, basicblocks: int, libbasicblocks: int, edges: int, libedges: int, instructions: int, libinstructions: int)[source]¶
File diffed in database.
- class bindiff.file.FunctionMatch(id: int, address1: int, name1: str, address2: int, name2: str, similarity: float, confidence: float, algorithm: FunctionAlgorithm)[source]¶
A match between two functions in database.
- algorithm: FunctionAlgorithm¶
algorithm used for the match
- class bindiff.file.BasicBlockMatch(id: int, function_match: FunctionMatch, address1: int, address2: int, algorithm: BasicBlockAlgorithm)[source]¶
A match between two basic blocks
- algorithm: BasicBlockAlgorithm¶
algorithm used to match the basic blocks
- function_match: FunctionMatch¶
FunctionMatch associated with this match
- enum bindiff.types.BasicBlockAlgorithm(value)[source]¶
Bases:
IntEnum
Basic block matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)
- Member Type:
Valid values are as follows:
- edges_prime_product = <BasicBlockAlgorithm.edges_prime_product: 1>¶
- hash_matching_four_inst_min = <BasicBlockAlgorithm.hash_matching_four_inst_min: 2>¶
- prime_matching_four_inst_min = <BasicBlockAlgorithm.prime_matching_four_inst_min: 3>¶
- call_reference_matching = <BasicBlockAlgorithm.call_reference_matching: 4>¶
- string_references_matching = <BasicBlockAlgorithm.string_references_matching: 5>¶
- edges_md_index_top_down = <BasicBlockAlgorithm.edges_md_index_top_down: 6>¶
- md_index_matching_top_down = <BasicBlockAlgorithm.md_index_matching_top_down: 7>¶
- edges_md_index_bottom_up = <BasicBlockAlgorithm.edges_md_index_bottom_up: 8>¶
- md_index_matching_bottom_up = <BasicBlockAlgorithm.md_index_matching_bottom_up: 9>¶
- relaxed_md_index_matching = <BasicBlockAlgorithm.relaxed_md_index_matching: 10>¶
- prime_matching_no_inst_min = <BasicBlockAlgorithm.prime_matching_no_inst_min: 11>¶
- edges_lengauer_tarjan_dominated = <BasicBlockAlgorithm.edges_lengauer_tarjan_dominated: 12>¶
- loop_entry_matching = <BasicBlockAlgorithm.loop_entry_matching: 13>¶
- self_loop_matching = <BasicBlockAlgorithm.self_loop_matching: 14>¶
- entry_point_matching = <BasicBlockAlgorithm.entry_point_matching: 15>¶
- exit_point_matching = <BasicBlockAlgorithm.exit_point_matching: 16>¶
- instruction_count_matching = <BasicBlockAlgorithm.instruction_count_matching: 17>¶
- jump_sequence_matching = <BasicBlockAlgorithm.jump_sequence_matching: 18>¶
- propagation_size_one = <BasicBlockAlgorithm.propagation_size_one: 19>¶
- manual = <BasicBlockAlgorithm.manual: 20>¶
- exception bindiff.types.BindiffNotFound[source]¶
Bases:
Exception
Exception raised if Bindiff binary cannot be found when trying to diff two binaries.
- enum bindiff.types.FunctionAlgorithm(value)[source]¶
Bases:
IntEnum
Function matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)
- Member Type:
Valid values are as follows:
- name_hash_matching = <FunctionAlgorithm.name_hash_matching: 1>¶
- hash_matching = <FunctionAlgorithm.hash_matching: 2>¶
- edges_flowgraph_md_index = <FunctionAlgorithm.edges_flowgraph_md_index: 3>¶
- edges_callgraph_md_index = <FunctionAlgorithm.edges_callgraph_md_index: 4>¶
- md_index_matching_flowgraph_top_down = <FunctionAlgorithm.md_index_matching_flowgraph_top_down: 5>¶
- md_index_matching_flowgraph_bottom_up = <FunctionAlgorithm.md_index_matching_flowgraph_bottom_up: 6>¶
- prime_signature_matching = <FunctionAlgorithm.prime_signature_matching: 7>¶
- md_index_matching_callGraph_top_down = <FunctionAlgorithm.md_index_matching_callGraph_top_down: 8>¶
- md_index_matching_callGraph_bottom_up = <FunctionAlgorithm.md_index_matching_callGraph_bottom_up: 9>¶
- relaxed_md_index_matching = <FunctionAlgorithm.relaxed_md_index_matching: 10>¶
- instruction_count = <FunctionAlgorithm.instruction_count: 11>¶
- address_sequence = <FunctionAlgorithm.address_sequence: 12>¶
- string_references = <FunctionAlgorithm.string_references: 13>¶
- loop_count_matching = <FunctionAlgorithm.loop_count_matching: 14>¶
- call_sequence_matching_exact = <FunctionAlgorithm.call_sequence_matching_exact: 15>¶
- call_sequence_matching_topology = <FunctionAlgorithm.call_sequence_matching_topology: 16>¶
- call_sequence_matching_sequence = <FunctionAlgorithm.call_sequence_matching_sequence: 17>¶
- call_reference_matching = <FunctionAlgorithm.call_reference_matching: 18>¶
- manual = <FunctionAlgorithm.manual: 19>¶