

class qbindiff.GenericGraph[source]

Bases: object

Abstract class representing a generic graph

abstract property edges: Iterable[tuple[NodeLabel, NodeLabel]]

Iterate over the edges. An edge is a pair (node_label_a, node_label_b)


An Iterable over the edges.

abstract get_node(node_label: NodeLabel) GenericNode[source]

Get the node identified by the node_label


node_label – the unique identifier of the node


The node identified by the label

abstract items() Iterable[tuple[NodeLabel, GenericNode]][source]

Iterate over the items. Each item is {node_label: node}


A Iterable over the items. Each item is a tuple (node_label, node)

abstract property node_labels: Iterable[NodeLabel]

Iterate over the node labels


An Iterable over the node labels

abstract property nodes: Iterable[GenericNode]

Iterate over the nodes themselves


An Iterable over the nodes


class qbindiff.GenericNode[source]

Bases: Hashable

Abstract class representing a generic node

abstract get_label() NodeLabel[source]

Get the label associated to this node


The node label associated with this node


class qbindiff.Differ(primary: GenericGraph, secondary: GenericGraph, *, sparsity_ratio: float = 0.6, tradeoff: float = 0.8, epsilon: float = 0.9, maxiter: int = 1000, sparse_row: bool = False)[source]

Bases: object

Abstract class that perform the NAP diffing between two generic graphs.

  • primary – primary graph

  • secondary – secondary graph

  • sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type qbindiff.types.Ratio

  • tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type qbindiff.types.Ratio

  • epsilon – perturbation parameter to enforce convergence and speed up computation, of type qbindiff.types.Positive. The greatest the fastest, but least accurate

  • maxiter – maximum number of message passing iterations

  • sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row


alias of float32

compute_matching() Mapping | None[source]

Run the belief propagation algorithm. This method hangs until the computation is done. The resulting matching is returned as a Mapping object.


Mapping between items of the primary and items of the secondary

extract_adjacency_matrix(graph: Graph) tuple[AdjacencyMatrix, dict[Idx, NodeLabel], dict[NodeLabel, Idx]][source]

Returns the adjacency matrix for the graph and the mappings


graph – Graph whose adjacency matrix should be extracted


A tuple containing in this order: the adjacency matrix of the graph, the map between index to label, the map between label to index.

get_similarities(primary_idx: list[Idx], secondary_idx: list[Idx]) ArrayLike1D[source]

Returns the similarity scores between the nodes specified as parameter. By default, it uses the similarity matrix. This method is meant to be overridden by subclasses to give more meaningful scores

  • primary_idx – the List of integers that represent nodes inside the primary graph

  • secondary_idx – the List of integers that represent nodes inside the primary graph


A sequence with the corresponding similarities of the given nodes

matching_iterator() Generator[int, None, None][source]

Run the belief propagation algorithm.


A generator the yields the iteration number until the algorithm either converges or reaches self.maxiter


Primary graph

process() None[source]

Initialize all the variables for the NAP algorithm.

process_iterator() Iterator[int][source]

Initialize all the variables for the NAP algorithm in an iterative way. It returns an iterator that can be used for tracking the progress.


An iterator of values in the range [0, 1000] used for tracking progress. It might contain more than 1000 elements.

register_prepass(pass_func: GenericPrePass, **extra_args) None[source]

Register a new pre-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one. .. warning:: A prepass should assign values to the full row or the full column, it should never assign single entries in the matrix


pass_func – Pass method to apply on the similarity matrix. Example : a Pass that first matches import functions.


Secondary graph


class qbindiff.QBinDiff(primary: Program, secondary: Program, distance: Distance = Distance.haussmann, normalize: bool = False, **kwargs)[source]

Bases: Differ

QBinDiff class that provides a high-level interface to trigger a diff between two binaries.

  • primary – The primary binary of type qbindiff.loader.Program

  • secondary – The secondary binary of type qbindiff.loader.Program

  • distance – the distance function used when comparing the feature vector extracted from the graphs.

  • normalize – Normalize the two programs Call Graphs with a series of heuristics. Look at normalize() for more information.


alias of float32

export_to_bindiff(filename: str) None[source]

Exports diffing results inside the BinDiff format


filename – Name of the output diffing file

get_similarities(primary_idx: list[int], secondary_idx: list[int]) ArrayLike1D[source]

Returns the similarity scores between the nodes specified as parameter. Uses MinHash fuzzy hash at basic block level to give a similarity score.

  • primary_idx – List of node indexes inside the primary

  • secondary_idx – List of node indexes inside the secondary


A sequence with the corresponding similarities of the given nodes

match_import_functions(sim_matrix: SimMatrix, primary: Program, secondary: Program, primary_mapping: dict[Addr, Idx], secondary_mapping: dict[Addr, Idx]) None[source]

Anchoring phase. This phase considers import functions as anchors to the matching and set these functions similarity to 1. This anchoring phase is necessary to obtain a good match.

  • sim_matrix – The similarity matrix of between the primary and secondary, of type qbindiff.types:SimMatrix

  • primary – The primary binary of type qbindiff.loader.Program

  • secondary – The secondary binary of type qbindiff.loader.Program

  • primary_mapping – Mapping between the primary function addresses and their corresponding index

  • secondary_mapping – Mapping between the secondary function addresses and their corresponding index

normalize(program: Program) Program[source]

Normalize the input Program. In some cases, this can create an exception, caused by a thunk function.


program – the program of type qbindiff.loader.Program to normalize.


the normalized program

register_feature_extractor(extractor_class: type[FeatureExtractor], weight: float | None = 1.0, distance: Distance | None = None, **extra_args) None[source]

Register a feature extractor class. This will include the corresponding feature in the similarity matrix computation

  • extractor_class – A feature extractor of type qbindiff.features.extractor

  • weight – Weight associated to the corresponding feature. Default is 1.

  • distance – Distance used only for this feature. It does not make sense to use it with bnb feature, but it can be useful for the WeisfeilerLehman feature.

register_postpass(pass_func: GenericPostPass, **extra_args) None[source]

Register a new post-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one.


pass_func – Pass method to apply on the similarity matrix. Example: a Pass that enforces the matches considering certain features extracted.


class qbindiff.DiGraphDiffer(primary: DiGraph, secondary: DiGraph, **kwargs)[source]

Bases: Differ

Differ implementation for two generic networkx.DiGraph

Abstract class that perform the NAP diffing between two generic graphs.

  • primary – primary graph

  • secondary – secondary graph

  • sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type qbindiff.types.Ratio

  • tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type qbindiff.types.Ratio

  • epsilon – perturbation parameter to enforce convergence and speed up computation, of type qbindiff.types.Positive. The greatest the fastest, but least accurate

  • maxiter – maximum number of message passing iterations

  • sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row

gen_sim_matrix(sim_matrix: SimMatrix, *args, **kwargs) None[source]

Initialize the similarity matrix


sim_matrix – The similarity matrix of type qbindiff.types.SimMatrix

