Bindiff

BinDiff is one of the oldest and most widely used differ in the reverse engineering community. First developed at Zynamics, it was then acquired by Google. This differ is based on properties of CallGraph to establish matches between functions of two binaries. For more details about BinDiff heuristics, have a look at one the first paper or the documentation.

Limitations: Bindiff has primarily been designed to be used using the main Java GUI application. A diff can be triggered using using command line, but no API has been implemented to trigger a diff programatically. Similarly, no API enables manipulating the diff result itself.

As such, python-bindiff has been developed to provide a Python API enabling triggering a diff and manipulating its content.


Python Bindiff

python-bindiff is a python module aiming to give a friendly interface to launch and manipulate bindiff between two binary iles.

How it works ?

The module relies on python-binexport to extract programs .BinExport and then directly interact with the binary differ (of zynamics) to perform the diff. The generated diff file is then correlated with the two binaries to be able to navigate the changes.

Installation

The python module requires Bindiff. Thus first refers to Zynamics installation directives.

Then the python module can be installed with:

pip install python-bindiff

The python module needs to execute the differ executable. As such it should be available:

  • either in the path

  • or via the BINDIFF_PATH environment variable

Usage as a python module

The simplest way to get the programs diffed is:

from bindiff import BinDiff

diff = BinDiff("sample1.BinExport", "sample2.BinExport", "diff.BinDiff")
print(diff.similarity, diff.confidence)
# do whatever you want with diff.primary, diff.secondary which are the
# two Program object

But programs can be instanciated separately:

from binexport import ProgramBinExport
from bindiff import BinDiff
p1 = ProgramBinExport("sample1.BinExport")
p2 = ProgramBinExport("sample2.BinExport")

diff = BinDiff(p1, p2, "diff.BinDiff")

Note that all the diff data are embedded inside program objects thus after instanciating BinDiff those p1 and p2 are modified.

From the API it is also possible to directly perform the BinExport extraction and the diffing:

from bindiff import BinDiff

diff = BinDiff.from_binary_files("sample1.exe", "sample2.exe", "out.BinDiff")

# or performing the diff on BinExport files
diff = BinDiff.from_binexport_files("sample1.BinExport", "sample2.BinExport", "out.BinDiff")

Usage as a command line

The bindiffer command line allows to generate a diff file from the two .BinExport files or directly from the binaries (thanks to python-binexport and idascript). The help message is the following:

Usage: bindiffer [OPTIONS] <primary file> <secondary file>

  bindiffer is a very simple utility to diff two binary files using BinDiff in command line. The two input files can be either binary files (in which case
  IDA is used) or directly .BinExport file (solely BinDiff is used).

Options:
  -i, --ida-path PATH      IDA Pro installation directory
  -b, --bindiff-path PATH  BinDiff differ directory
  -t, --type <type>        inputs files type ('bin', 'binexport') [default:'bin']
  -o, --output PATH        Output file matching
  -h, --help               Show this message and exit.

To work bindiff differ binary should be in the $PATH, given via the BINDIFF_PATH environment variable or with the -b command option. Similarly when diff binaries directly the ida64 binary should be available in the $PATH, given with the IDA_PATH environment variable or via the -i command option.


API

BinDiff

class bindiff.BinDiff(primary: ProgramBinExport | str, secondary: ProgramBinExport | str, diff_file: str)[source]

BinDiff class. Parse the diffing result of Bindiff and apply it to the two ProgramBinExport given. All the diff result is embedded in the two programs object so after loading the class can be dropped if needed.

Warning

the two programs given are mutated into ProgramBinDiff object which inherit SimilarityMixin and DictMatchMixin which provides additional attributes and method to the class.

Parameters:
  • primary – first program diffed

  • secondary – second program diffed

  • diff_file – diffing file as generated by bindiff (differ more specifically)

static assert_installation_ok() None[source]

Assert BinDiff is installed

Raises:

BindiffNotFound – if the bindiff binary cannot be found

static from_binary_files(p1_path: str, p2_path: str, diff_out: str, override: bool = False) BinDiff | None[source]

Diff two executable files. Thus it export .BinExport files from IDA and then diff the two resulting files in BinDiff.

Parameters:
  • p1_path – primary binary file to diff

  • p2_path – secondary binary file to diff

  • diff_out – output file for the diff

  • override – override Binexports files and diffing

Returns:

BinDiff object representing the diff

static from_binexport_files(p1_binexport: ProgramBinExport | str, p2_binexport: ProgramBinExport | str, diff_out: str, override: bool = False) BinDiff | None[source]

Diff two binexport files. Diff the two binexport files with bindiff and then load a BinDiff instance.

Parameters:
  • p1_binexport – primary binexport file to diff (path or object)

  • p2_binexport – secondary binexport file to diff (path or object)

  • diff_out – output file for the diff

  • override – override Binexports files and diffing

Returns:

BinDiff object representing the diff

get_match(function: FunctionBinExport) tuple[FunctionBinExport, FunctionMatch] | None[source]

Get the function that matches the provided one.

Parameters:

function – A function that belongs either to primary or secondary

Returns:

A tuple with the matched function and the match object if there is a match for the provided function, otherwise None

static is_installation_ok() bool[source]

Check that bindiff is properly installed and can be found on the system.

Returns:

true if the bindiff binary can be found.

is_matched(function: FunctionBinExport) bool[source]
Parameters:

function – A function that belongs either to primary or secondary.

Returns:

True if there is a match for the provided function, False otherwise

iter_basicblock_matches(function1: FunctionBinExport, function2: FunctionBinExport) list[tuple[BasicBlockBinExport, BasicBlockBinExport, BasicBlockMatch]][source]

Return a list of all the matched basic blocks between the two provided functions. Each element of the list is a tuple containing the basic blocks of the primary and secondary functions and the BasicBlockMatch object describing the match. The first function must be part of the primary program while the second function must be part of the secondary program.

Parameters:
  • function1 – A function of the primary program

  • function2 – A function of the secondary program

Returns:

list of tuple, each containing the primary basic block, the secondary basic block and the BasicBlockMatch object

iter_function_matches() list[tuple[FunctionBinExport, FunctionBinExport, FunctionMatch]][source]

Return a list of all the matched functions. Each element of the list is a tuple containing the function in the primary program, the matched function in the secondary program and the FunctionMatch object describing the match

Returns:

list of tuple, each containing the primary function, the secondary function and the FunctionMatch object

iter_instruction_matches(block1: BasicBlockBinExport, block2: BasicBlockBinExport) list[tuple[InstructionBinExport, InstructionBinExport]][source]

Return a list of all the matched instructions between the two provided basic blocks. Each element of the list is a tuple containing the instructions of the primary and secondary basic blocks. The first basic block must belong to the primary program while the second one must be part of the secondary program.

Parameters:
  • block1 – A basic block belonging to the primary program

  • block2 – A basic block belonging to the secondary program

Returns:

list of tuple, each containing the primary instruction and the secondary instruction

primary

Primary BinExport object

primary_unmatched_basic_block(function: FunctionBinExport) list[BasicBlockBinExport][source]

Return a list of the unmatched basic blocks in the provided function. The function must be part of the primary program.

Parameters:

function – A function of the primary program

Returns:

list of unmatched basic blocks

primary_unmatched_function() list[FunctionBinExport][source]

Return a list of the unmatched functions in the primary program.

Returns:

list of unmatched functions in primary

primary_unmatched_instruction(bb: BasicBlockBinExport) list[InstructionBinExport][source]

Return a list of the unmatched instructions in the provided basic block. The basic block must be part of the primary program.

Parameters:

bb – A basic block belonging to the primary program

Returns:

list of unmatched instructions

static raw_diffing(p1_path: Path | str, p2_path: Path | str, out_diff: str) bool[source]

Static method to diff two binexport files against each other and storing the diffing result in the given file

Parameters:
  • p1_path – primary file path

  • p2_path – secondary file path

  • out_diff – diffing output file

Returns:

True if successful, False otherwise

secondary

Secondary BinExport object

secondary_unmatched_basic_block(function: FunctionBinExport) list[BasicBlockBinExport][source]

Return a list of the unmatched basic blocks in the provided function. The function must be part of the secondary program.

Parameters:

function – A function of the secondary program

Returns:

list of unmatched basic blocks

secondary_unmatched_function() list[FunctionBinExport][source]

Return a list of the unmatched functions in the secondary program.

Returns:

list of unmatched functions in secondary

secondary_unmatched_instruction(bb: BasicBlockBinExport) list[InstructionBinExport][source]

Return a list of the unmatched instructions in the provided basic block. The basic block must be part of the secondary program.

Parameters:

bb – A basic block belonging to the secondary program

Returns:

list of unmatched instructions

BinDiff File

class bindiff.file.BindiffFile(file: Path | str, permission: str = 'ro')[source]

Bindiff database file. The class seemlessly parse the database and allowing retrieving and manipulating the results.

It also provides some methods to create a database and to add entries in the database.

Parameters:
  • file – path to Bindiff database

  • permission – permission to use for opening database (default: ro)

add_basic_block_match(fun_addr1: int, fun_addr2: int, bb_addr1: int, bb_addr2: int) int[source]

Add a basic block match in database.

Parameters:
  • fun_addr1 – function address of basic block in primary

  • fun_addr2 – function address of basic block in secondary

  • bb_addr1 – basic block address in primary

  • bb_addr2 – basic block address in secondary

Returns:

id of the row inserted in database.

add_function_match(fun_addr1: int, fun_addr2: int, fun_name1: str, fun_name2: str, similarity: float, confidence: float = 0.0, identical_bbs: int = 0) int[source]

Add a function match in database.

Parameters:
  • fun_addr1 – primary function address

  • fun_addr2 – secondary function address

  • fun_name1 – primary function name

  • fun_name2 – secondary function name

  • similarity – similarity score between the two functions

  • confidence – confidence score between the two functions

  • identical_bbs – number of identical basic blocks

Returns:

id of the row inserted in database.

add_instruction_match(entry: int, inst_addr1: int, inst_addr2: int) None[source]

Add an instruction match in database.

Parameters:
  • entry – basic block match identifier in database

  • inst_addr1 – instruction address in primary

  • inst_addr2 – instruction address in secondary

property basicblock_matches: list[BasicBlockMatch]

Returns the list of matched basic blocks in primary (and secondary)

confidence: float

Overall diffing confidence

static create(filename: str, primary: str, secondary: str, version: str, desc: str, similarity: float, confidence: float) BindiffFile[source]

Create a new Bindiff database object in the file given in filename. It only takes two binaries.

Parameters:
  • filename – database file path

  • primary – path to primary binary

  • secondary – path to secondary binary

  • version – version of the differ used

  • desc – description of the database

  • similarity – similarity score between to two binaries

  • confidence – confidence of results

Returns:

instance of BindiffFile (ready to be filled)

created: datetime

Database creation date

property function_matches: list[FunctionMatch]

Returns the list of matched functions

static init_database(db: Connection) None[source]

Initialize the database by creating all the tables

modified: datetime

Database last modification date

primary_basicblock_match: dict[int, dict[int, BasicBlockMatch]]

Basic block match from primary

primary_file: File

Primary file

primary_functions_match: dict[int, FunctionMatch]

FunctionMatch indexed by addresses in primary

secondary_basicblock_match: dict[int, dict[int, BasicBlockMatch]]

Basic block match from secondary

secondary_file: File

Secondary file

secondary_functions_match: dict[int, FunctionMatch]

FunctionMatch indexed by addresses in secondary

similarity: float

Overall similarity

property unmatched_primary_count: int

Returns the number of functions inside primary that are not matched

property unmatched_secondary_count: int

Returns the number of functions inside secondary that are not matched

update_file_infos(entry_id: int, fun_count: int, lib_count: int, bb_count: int, inst_count: int) None[source]

Update information about a binary in database (function, basic block count …)

Parameters:
  • entry_id – entry of the binary in database (row id)

  • fun_count – number of functions

  • lib_count – number of functions flagged as libraries

  • bb_count – number of basic blocks

  • inst_count – number of instructions

version: str

version of the differ used for diffing

Types

class bindiff.file.File(id: int, filename: str, exefilename: str, hash: str, functions: int, libfunctions: int, calls: int, basicblocks: int, libbasicblocks: int, edges: int, libedges: int, instructions: int, libinstructions: int)[source]

File diffed in database.

basicblocks: int

number of basic blocks

calls: int

number of calls

edges: int

number of edges in callgraph

exefilename: str

file name

filename: str

file path

functions: int

total number of functions

hash: str

SHA256 hash of the file

id: int

Unique ID of the file in database

instructions: int

number of instructions

libbasicblocks: int

number of basic blocks belonging to library functions

libedges: int

number of edges in callgraph addressing a library

libfunctions: int

total number of functions identified as library

libinstructions: int

number of instructions in library functions

class bindiff.file.FunctionMatch(id: int, address1: int, name1: str, address2: int, name2: str, similarity: float, confidence: float, algorithm: FunctionAlgorithm)[source]

A match between two functions in database.

address1: int

function address in primary

address2: int

function address in secondary

algorithm: FunctionAlgorithm

algorithm used for the match

confidence: float

confidence of the match (0..1)

id: int

unique ID of function match in database

name1: str

function name in primary

name2: str

function name in secondary

similarity: float

similarity score (0..1)

class bindiff.file.BasicBlockMatch(id: int, function_match: FunctionMatch, address1: int, address2: int, algorithm: BasicBlockAlgorithm)[source]

A match between two basic blocks

address1: int

basic block address in primary

address2: int

basic block address in secondary

algorithm: BasicBlockAlgorithm

algorithm used to match the basic blocks

function_match: FunctionMatch

FunctionMatch associated with this match

id: int

ID of the match in database

enum bindiff.types.BasicBlockAlgorithm(value)[source]

Bases: IntEnum

Basic block matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)

Member Type:

int

Valid values are as follows:

edges_prime_product = <BasicBlockAlgorithm.edges_prime_product: 1>
hash_matching_four_inst_min = <BasicBlockAlgorithm.hash_matching_four_inst_min: 2>
prime_matching_four_inst_min = <BasicBlockAlgorithm.prime_matching_four_inst_min: 3>
call_reference_matching = <BasicBlockAlgorithm.call_reference_matching: 4>
string_references_matching = <BasicBlockAlgorithm.string_references_matching: 5>
edges_md_index_top_down = <BasicBlockAlgorithm.edges_md_index_top_down: 6>
md_index_matching_top_down = <BasicBlockAlgorithm.md_index_matching_top_down: 7>
edges_md_index_bottom_up = <BasicBlockAlgorithm.edges_md_index_bottom_up: 8>
md_index_matching_bottom_up = <BasicBlockAlgorithm.md_index_matching_bottom_up: 9>
relaxed_md_index_matching = <BasicBlockAlgorithm.relaxed_md_index_matching: 10>
prime_matching_no_inst_min = <BasicBlockAlgorithm.prime_matching_no_inst_min: 11>
edges_lengauer_tarjan_dominated = <BasicBlockAlgorithm.edges_lengauer_tarjan_dominated: 12>
loop_entry_matching = <BasicBlockAlgorithm.loop_entry_matching: 13>
self_loop_matching = <BasicBlockAlgorithm.self_loop_matching: 14>
entry_point_matching = <BasicBlockAlgorithm.entry_point_matching: 15>
exit_point_matching = <BasicBlockAlgorithm.exit_point_matching: 16>
instruction_count_matching = <BasicBlockAlgorithm.instruction_count_matching: 17>
jump_sequence_matching = <BasicBlockAlgorithm.jump_sequence_matching: 18>
propagation_size_one = <BasicBlockAlgorithm.propagation_size_one: 19>
manual = <BasicBlockAlgorithm.manual: 20>
exception bindiff.types.BindiffNotFound[source]

Bases: Exception

Exception raised if Bindiff binary cannot be found when trying to diff two binaries.

enum bindiff.types.FunctionAlgorithm(value)[source]

Bases: IntEnum

Function matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)

Member Type:

int

Valid values are as follows:

name_hash_matching = <FunctionAlgorithm.name_hash_matching: 1>
hash_matching = <FunctionAlgorithm.hash_matching: 2>
edges_flowgraph_md_index = <FunctionAlgorithm.edges_flowgraph_md_index: 3>
edges_callgraph_md_index = <FunctionAlgorithm.edges_callgraph_md_index: 4>
md_index_matching_flowgraph_top_down = <FunctionAlgorithm.md_index_matching_flowgraph_top_down: 5>
md_index_matching_flowgraph_bottom_up = <FunctionAlgorithm.md_index_matching_flowgraph_bottom_up: 6>
prime_signature_matching = <FunctionAlgorithm.prime_signature_matching: 7>
md_index_matching_callGraph_top_down = <FunctionAlgorithm.md_index_matching_callGraph_top_down: 8>
md_index_matching_callGraph_bottom_up = <FunctionAlgorithm.md_index_matching_callGraph_bottom_up: 9>
relaxed_md_index_matching = <FunctionAlgorithm.relaxed_md_index_matching: 10>
instruction_count = <FunctionAlgorithm.instruction_count: 11>
address_sequence = <FunctionAlgorithm.address_sequence: 12>
string_references = <FunctionAlgorithm.string_references: 13>
loop_count_matching = <FunctionAlgorithm.loop_count_matching: 14>
call_sequence_matching_exact = <FunctionAlgorithm.call_sequence_matching_exact: 15>
call_sequence_matching_topology = <FunctionAlgorithm.call_sequence_matching_topology: 16>
call_sequence_matching_sequence = <FunctionAlgorithm.call_sequence_matching_sequence: 17>
call_reference_matching = <FunctionAlgorithm.call_reference_matching: 18>
manual = <FunctionAlgorithm.manual: 19>