Bindiff

BinDiff is one of the oldest and most widely used differ in the reverse engineering community. First developed at Zynamics, it was then acquired by Google. This differ is based on properties of CallGraph to establish matches between functions of two binaries. For more details about BinDiff heuristics, have a look at one the first paper or the documentation.

Limitations: Bindiff has primarily been designed to be used using the main Java GUI application. A diff can be triggered using using command line, but no API has been implemented to trigger a diff programatically. Similarly, no API enables manipulating the diff result itself.

As such, python-bindiff has been developed to provide a Python API enabling triggering a diff and manipulating its content.


Python Bindiff

python-bindiff is a python module aiming to give a friendly interface to launch and manipulate bindiff between two binary iles.

How it works ?

The module relies on python-binexport to extract programs .BinExport and then directly interact with the binary differ (of zynamics) to perform the diff. The generated diff file is then correlated with the two binaries to be able to navigate the changes.

Installation

The python module requires Bindiff. Thus first refers to Zynamics installation directives.

Then the python module can be installed with:

pip install python-bindiff

The python module needs to execute the differ executable. As such it should be available:

  • either in the path

  • or via the BINDIFF_PATH environment variable

Usage as a python module

The simplest way to get the programs diffed is:

from bindiff import BinDiff

diff = BinDiff("sample1.BinExport", "sample2.BinExport", "diff.BinDiff")
print(diff.similarity, diff.confidence)
# do whatever you want with diff.primary, diff.secondary which are the
# two Program object

But programs can be instanciated separately:

from binexport import ProgramBinExport
from bindiff import BinDiff
p1 = ProgramBinExport("sample1.BinExport")
p2 = ProgramBinExport("sample2.BinExport")

diff = BinDiff(p1, p2, "diff.BinDiff")

Note that all the diff data are embedded inside program objects thus after instanciating BinDiff those p1 and p2 are modified.

From the API it is also possible to directly perform the BinExport extraction and the diffing:

from bindiff import BinDiff

diff = BinDiff.from_binary_files("sample1.exe", "sample2.exe", "out.BinDiff")

# or performing the diff on BinExport files
diff = BinDiff.from_binexport_files("sample1.BinExport", "sample2.BinExport", "out.BinDiff")

Usage as a command line

The bindiffer command line allows to generate a diff file from the two .BinExport files or directly from the binaries (thanks to python-binexport and idascript). The help message is the following:

Usage: bindiffer [OPTIONS] <primary file> <secondary file>

  bindiffer is a very simple utility to diff two binary files using BinDiff in command line. The two input files can be either binary files (in which case
  IDA is used) or directly .BinExport file (solely BinDiff is used).

Options:
  -i, --ida-path PATH      IDA Pro installation directory
  -b, --bindiff-path PATH  BinDiff differ directory
  -t, --type <type>        inputs files type ('bin', 'binexport') [default:'bin']
  -o, --output PATH        Output file matching
  -h, --help               Show this message and exit.

To work bindiff differ binary should be in the $PATH, given via the BINDIFF_PATH environment variable or with the -b command option. Similarly when diff binaries directly the ida64 binary should be available in the $PATH, given with the IDA_PATH environment variable or via the -i command option.


API

BinDiff

class bindiff.BinDiff(primary: ProgramBinExport | str, secondary: ProgramBinExport | str, diff_file: str)[source]

BinDiff class. Parse the diffing result of Bindiff and apply it to the two ProgramBinExport given. All the diff result is embedded in the two programs object so after loading the class can be dropped if needed.

Warning

the two programs given are mutated into ProgramBinDiff object which inherit SimilarityMixin and DictMatchMixin which provides additional attributes and method to the class.

Parameters:
  • primary – first program diffed

  • secondary – second program diffed

  • diff_file – diffing file as generated by bindiff (differ more specifically)

static assert_installation_ok() None[source]

Assert BinDiff is installed

Raises:

BindiffNotFound – if the bindiff binary cannot be found

static from_binary_files(p1_path: str, p2_path: str, diff_out: str) BinDiff | None[source]

Diff two executable files. Thus it export .BinExport files from IDA and then diff the two resulting files in BinDiff.

Parameters:
  • p1_path – primary binary file to diff

  • p2_path – secondary binary file to diff

  • diff_out – output file for the diff

Returns:

BinDiff object representing the diff

static from_binexport_files(p1_binexport: str, p2_binexport: str, diff_out: str) BinDiff | None[source]

Diff two binexport files. Diff the two binexport files with bindiff and then load a BinDiff instance.

Parameters:
  • p1_binexport – primary binexport file to diff

  • p2_binexport – secondary binexport file to diff

  • diff_out – output file for the diff

Returns:

BinDiff object representing the diff

static is_installation_ok() bool[source]

Check that bindiff is properly installed and can be found on the system.

Returns:

true if the bindiff binary can be found.

primary

Primary BinExport object

static raw_diffing(p1_path: Path | str, p2_path: Path | str, out_diff: str) bool[source]

Static method to diff two binexport files against each other and storing the diffing result in the given file

Parameters:
  • p1_path – primary file path

  • p2_path – secondary file path

  • out_diff – diffing output file

Returns:

True if successful, False otherwise

secondary

Secondary BinExport object

BinDiff File

class bindiff.file.BindiffFile(file: Path | str, permission: str = 'ro')[source]

Bindiff database file. The class seemlessly parse the database and allowing retrieving and manipulating the results.

It also provides some methods to create a database and to add entries in the database.

Parameters:
  • file – path to Bindiff database

  • permission – permission to use for opening database (default: ro)

add_basic_block_match(fun_addr1: int, fun_addr2: int, bb_addr1: int, bb_addr2: int) int[source]

Add a basic block match in database.

Parameters:
  • fun_addr1 – function address of basic block in primary

  • fun_addr2 – function address of basic block in secondary

  • bb_addr1 – basic block address in primary

  • bb_addr2 – basic block address in secondary

Returns:

id of the row inserted in database.

add_function_match(fun_addr1: int, fun_addr2: int, fun_name1: str, fun_name2: str, similarity: float, confidence: float = 0.0, identical_bbs: int = 0) int[source]

Add a function match in database.

Parameters:
  • fun_addr1 – primary function address

  • fun_addr2 – secondary function address

  • fun_name1 – primary function name

  • fun_name2 – secondary function name

  • similarity – similarity score between the two functions

  • confidence – confidence score between the two functions

  • identical_bbs – number of identical basic blocks

Returns:

id of the row inserted in database.

add_instruction_match(entry: int, inst_addr1: int, inst_addr2: int) None[source]

Add an instruction match in database.

Parameters:
  • entry – basic block match identifier in database

  • inst_addr1 – instruction address in primary

  • inst_addr2 – instruction address in secondary

property basicblock_matches: list[BasicBlockMatch]

Returns the list of matched basic blocks in primary (and secondary)

confidence: float

Overall diffing confidence

static create(filename: str, primary: str, secondary: str, version: str, desc: str, similarity: float, confidence: float) BindiffFile[source]

Create a new Bindiff database object in the file given in filename. It only takes two binaries.

Parameters:
  • filename – database file path

  • primary – path to primary binary

  • secondary – path to secondary binary

  • version – version of the differ used

  • desc – description of the database

  • similarity – similarity score between to two binaries

  • confidence – confidence of results

Returns:

instance of BindiffFile (ready to be filled)

created: datetime

Database creation date

property function_matches: list[FunctionMatch]

Returns the list of matched functions

static init_database(db: Connection) None[source]

Initialize the database by creating all the tables

modified: datetime

Database last modification date

primary_basicblock_match: dict[int, dict[int, BasicBlockMatch]]

Basic block match from primary

primary_file: File

Primary file

primary_functions_match: dict[int, FunctionMatch]

FunctionMatch indexed by addresses in primary

secondary_basicblock_match: dict[int, dict[int, BasicBlockMatch]]

Basic block match from secondary

secondary_file: File

Secondary file

secondary_functions_match: dict[int, FunctionMatch]

FunctionMatch indexed by addresses in secondary

similarity: float

Overall similarity

property unmatched_primary_count: int

Returns the number of functions inside primary that are not matched

property unmatched_secondary_count: int

Returns the number of functions inside secondary that are not matched

update_file_infos(entry_id: int, fun_count: int, lib_count: int, bb_count: int, inst_count: int) None[source]

Update information about a binary in database (function, basic block count …)

Parameters:
  • entry_id – entry of the binary in database (row id)

  • fun_count – number of functions

  • lib_count – number of functions flagged as libraries

  • bb_count – number of basic blocks

  • inst_count – number of instructions

version: str

version of the differ used for diffing

Types

class bindiff.file.File(id: int, filename: str, exefilename: str, hash: str, functions: int, libfunctions: int, calls: int, basicblocks: int, libbasicblocks: int, edges: int, libedges: int, instructions: int, libinstructions: int)[source]

File diffed in database.

basicblocks: int

number of basic blocks

calls: int

number of calls

edges: int

number of edges in callgraph

exefilename: str

file name

filename: str

file path

functions: int

total number of functions

hash: str

SHA256 hash of the file

id: int

Unique ID of the file in database

instructions: int

number of instructions

libbasicblocks: int

number of basic blocks belonging to library functions

libedges: int

number of edges in callgraph addressing a library

libfunctions: int

total number of functions identified as library

libinstructions: int

number of instructions in library functions

class bindiff.file.FunctionMatch(id: int, address1: int, name1: str, address2: int, name2: str, similarity: float, confidence: float, algorithm: FunctionAlgorithm)[source]

A match between two functions in database.

address1: int

function address in primary

address2: int

function address in secondary

algorithm: FunctionAlgorithm

algorithm used for the match

confidence: float

confidence of the match (0..1)

id: int

unique ID of function match in database

name1: str

function name in primary

name2: str

function name in secondary

similarity: float

similarity score (0..1)

class bindiff.file.BasicBlockMatch(id: int, function_match: FunctionMatch, address1: int, address2: int, algorithm: BasicBlockAlgorithm)[source]

A match between two basic blocks

address1: int

basic block address in primary

address2: int

basic block address in secondary

algorithm: BasicBlockAlgorithm

algorithm used to match the basic blocks

function_match: FunctionMatch

FunctionMatch associated with this match

id: int

ID of the match in database

class bindiff.types.AlgorithmMixin[source]

Bases: object

Mixin class representing the matching algorithm as given by bindiff

property algorithm: BasicBlockAlgorithm | FunctionAlgorithm | None
enum bindiff.types.BasicBlockAlgorithm(value)[source]

Bases: IntEnum

Basic block matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)

Member Type:

int

Valid values are as follows:

edges_prime_product = <BasicBlockAlgorithm.edges_prime_product: 1>
hash_matching_four_inst_min = <BasicBlockAlgorithm.hash_matching_four_inst_min: 2>
prime_matching_four_inst_min = <BasicBlockAlgorithm.prime_matching_four_inst_min: 3>
call_reference_matching = <BasicBlockAlgorithm.call_reference_matching: 4>
string_references_matching = <BasicBlockAlgorithm.string_references_matching: 5>
edges_md_index_top_down = <BasicBlockAlgorithm.edges_md_index_top_down: 6>
md_index_matching_top_down = <BasicBlockAlgorithm.md_index_matching_top_down: 7>
edges_md_index_bottom_up = <BasicBlockAlgorithm.edges_md_index_bottom_up: 8>
md_index_matching_bottom_up = <BasicBlockAlgorithm.md_index_matching_bottom_up: 9>
relaxed_md_index_matching = <BasicBlockAlgorithm.relaxed_md_index_matching: 10>
prime_matching_no_inst_min = <BasicBlockAlgorithm.prime_matching_no_inst_min: 11>
edges_lengauer_tarjan_dominated = <BasicBlockAlgorithm.edges_lengauer_tarjan_dominated: 12>
loop_entry_matching = <BasicBlockAlgorithm.loop_entry_matching: 13>
self_loop_matching = <BasicBlockAlgorithm.self_loop_matching: 14>
entry_point_matching = <BasicBlockAlgorithm.entry_point_matching: 15>
exit_point_matching = <BasicBlockAlgorithm.exit_point_matching: 16>
instruction_count_matching = <BasicBlockAlgorithm.instruction_count_matching: 17>
jump_sequence_matching = <BasicBlockAlgorithm.jump_sequence_matching: 18>
propagation_size_one = <BasicBlockAlgorithm.propagation_size_one: 19>
manual = <BasicBlockAlgorithm.manual: 20>
class bindiff.types.BasicBlockBinDiff(program: ReferenceType[ProgramBinExport], function: ReferenceType[FunctionBinExport], pb_bb: BinExport2.BasicBlock)[source]

Bases: DictMatchMixin, AlgorithmMixin, BasicBlockBinExport

Diffed basic block. Enrich BasicBlockBinExport with the match and algorithm attributes (and theirs associated methods).

Parameters:
  • program – Weak reference to the program

  • function – Weak reference to the function

  • pb_bb – protobuf definition of the basic block

exception bindiff.types.BindiffNotFound[source]

Bases: Exception

Exception raised if Bindiff binary cannot be found when trying to diff two binaries.

class bindiff.types.DictMatchMixin[source]

Bases: MatchMixin

Extension of MatchMixin applied on dict object to compute the number of matched / unmatched object within the dict.

property nb_match: int
property nb_unmatch: int
enum bindiff.types.FunctionAlgorithm(value)[source]

Bases: IntEnum

Function matching algorithm enum. (id’s does not seem to change in bindiff so hardcoded here)

Member Type:

int

Valid values are as follows:

name_hash_matching = <FunctionAlgorithm.name_hash_matching: 1>
hash_matching = <FunctionAlgorithm.hash_matching: 2>
edges_flowgraph_md_index = <FunctionAlgorithm.edges_flowgraph_md_index: 3>
edges_callgraph_md_index = <FunctionAlgorithm.edges_callgraph_md_index: 4>
md_index_matching_flowgraph_top_down = <FunctionAlgorithm.md_index_matching_flowgraph_top_down: 5>
md_index_matching_flowgraph_bottom_up = <FunctionAlgorithm.md_index_matching_flowgraph_bottom_up: 6>
prime_signature_matching = <FunctionAlgorithm.prime_signature_matching: 7>
md_index_matching_callGraph_top_down = <FunctionAlgorithm.md_index_matching_callGraph_top_down: 8>
md_index_matching_callGraph_bottom_up = <FunctionAlgorithm.md_index_matching_callGraph_bottom_up: 9>
relaxed_md_index_matching = <FunctionAlgorithm.relaxed_md_index_matching: 10>
instruction_count = <FunctionAlgorithm.instruction_count: 11>
address_sequence = <FunctionAlgorithm.address_sequence: 12>
string_references = <FunctionAlgorithm.string_references: 13>
loop_count_matching = <FunctionAlgorithm.loop_count_matching: 14>
call_sequence_matching_exact = <FunctionAlgorithm.call_sequence_matching_exact: 15>
call_sequence_matching_topology = <FunctionAlgorithm.call_sequence_matching_topology: 16>
call_sequence_matching_sequence = <FunctionAlgorithm.call_sequence_matching_sequence: 17>
call_reference_matching = <FunctionAlgorithm.call_reference_matching: 18>
manual = <FunctionAlgorithm.manual: 19>
class bindiff.types.FunctionBinDiff(program: ReferenceType[ProgramBinExport], *, pb_fun: BinExport2.FlowGraph | None = None, is_import: bool = False, addr: int | None = None)[source]

Bases: DictMatchMixin, AlgorithmMixin, SimilarityMixin, FunctionBinExport

Function class to represent a diffed function. Enrich FunctionBinExport with math, similarity, confidence and algorithm attributes.

Constructor. Iterates the FlowGraph structure and initialize all the basic blocks and instruction accordingly.

Parameters:
  • program – weak reference to program (used to navigate pb fields contained inside)

  • pb_fun – FlowGraph protobuf structure

  • is_import – whether or not it’s an import function (if so does not initialize bb etc..)

  • addr – address of the function (info avalaible in the call graph)

class bindiff.types.InstructionBinDiff(program: ReferenceType[ProgramBinExport], function: ReferenceType[FunctionBinExport], addr: int, i_idx: int)[source]

Bases: MatchMixin, InstructionBinExport

Diff instruction. Simply add the match attribute to the InstructionBinExport class.

Parameters:
  • program – Weak reference to the program

  • function – Weak reference to the function

  • addr – address of the instruction (computed outside)

  • i_idx – instruction index in the protobuf data structure

class bindiff.types.MatchMixin[source]

Bases: object

Mixin class to represent a match between two object.

is_matched() bool[source]
property match: object | None
class bindiff.types.ProgramBinDiff(file: Path | str)[source]

Bases: DictMatchMixin, SimilarityMixin, ProgramBinExport

Program class to represent a diffed binary. Basically enrich a ProgramBinExport class with match, similarity, confidence attributes and the associated methods.

Parameters:

file – BinExport file path

class bindiff.types.SimilarityMixin[source]

Bases: object

Mixing class to represent a similarity between to entities, with confidence level.

property confidence: float | None
property similarity: float | None