Differs¶

GenericGraph¶

class qbindiff.GenericGraph[source]¶

Bases: object

Abstract class representing a generic graph

abstract property edges: Iterable[tuple[NodeLabel, NodeLabel]]¶

Iterate over the edges. An edge is a pair (node_label_a, node_label_b)

Returns:: An Iterable over the edges.

abstract get_node(node_label: NodeLabel) → GenericNode[source]¶

Get the node identified by the node_label

Parameters:: node_label – the unique identifier of the node
Returns:: The node identified by the label

abstract items() → Iterable[tuple[NodeLabel, GenericNode]][source]¶

Iterate over the items. Each item is {node_label: node}

Returns:: A Iterable over the items. Each item is a tuple (node_label, node)

abstract property node_labels: Iterable[NodeLabel]¶

Iterate over the node labels

Returns:: An Iterable over the node labels

abstract property nodes: Iterable[GenericNode]¶

Iterate over the nodes themselves

Returns:: An Iterable over the nodes

GenericNode¶

class qbindiff.GenericNode[source]¶

Bases: Hashable

Abstract class representing a generic node

abstract get_label() → NodeLabel[source]¶

Get the label associated to this node

Returns:: The node label associated with this node

Differ¶

class qbindiff.Differ(primary: GenericGraph, secondary: GenericGraph, *, sparsity_ratio: float = 0.6, tradeoff: float = 0.8, epsilon: float = 0.9, maxiter: int = 1000, sparse_row: bool = False)[source]¶

Bases: object

Abstract class that perform the NAP diffing between two generic graphs.

Parameters:

primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type qbindiff.types.Positive. The greatest the fastest, but least accurate
maxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row

DTYPE¶: alias of float32

compute_matching() → Mapping | None[source]¶

Run the belief propagation algorithm. This method hangs until the computation is done. The resulting matching is returned as a Mapping object.

Returns:: Mapping between items of the primary and items of the secondary

extract_adjacency_matrix(graph: Graph) → tuple[AdjacencyMatrix, dict[Idx, NodeLabel], dict[NodeLabel, Idx]][source]¶

Returns the adjacency matrix for the graph and the mappings

Parameters:: graph – Graph whose adjacency matrix should be extracted
Returns:: A tuple containing in this order: the adjacency matrix of the graph, the map between index to label, the map between label to index.

get_similarities(primary_idx: list[Idx], secondary_idx: list[Idx]) → ArrayLike1D[source]¶

Returns the similarity scores between the nodes specified as parameter. By default, it uses the similarity matrix. This method is meant to be overridden by subclasses to give more meaningful scores

Parameters:

primary_idx – the List of integers that represent nodes inside the primary graph
secondary_idx – the List of integers that represent nodes inside the primary graph

Returns:

A sequence with the corresponding similarities of the given nodes

matching_iterator() → Generator[int, None, None][source]¶

Run the belief propagation algorithm.

Returns:: A generator the yields the iteration number until the algorithm either converges or reaches self.maxiter

primary¶: Primary graph

process() → None[source]¶: Initialize all the variables for the NAP algorithm.

process_iterator() → Iterator[int][source]¶

Initialize all the variables for the NAP algorithm in an iterative way. It returns an iterator that can be used for tracking the progress.

Returns:: An iterator of values in the range [0, 1000] used for tracking progress. It might contain more than 1000 elements.

register_prepass(pass_func: GenericPrePass, **extra_args) → None[source]¶

Register a new pre-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one. .. warning:: A prepass should assign values to the full row or the full column, it should never assign single entries in the matrix

Parameters:: pass_func – Pass method to apply on the similarity matrix. Example : a Pass that first matches import functions.

secondary¶: Secondary graph

QBinDiff¶

class qbindiff.QBinDiff(primary: Program, secondary: Program, distance: Distance = Distance.haussmann, normalize: bool = False, **kwargs)[source]¶

Bases: Differ

QBinDiff class that provides a high-level interface to trigger a diff between two binaries.

Parameters:

primary – The primary binary of type qbindiff.loader.Program
secondary – The secondary binary of type qbindiff.loader.Program
distance – the distance function used when comparing the feature vector extracted from the graphs.
normalize – Normalize the two programs Call Graphs with a series of heuristics. Look at normalize() for more information.

DTYPE¶: alias of float32

export_to_bindiff(filename: str) → None[source]¶

Exports diffing results inside the BinDiff format

Parameters:: filename – Name of the output diffing file

get_similarities(primary_idx: list[int], secondary_idx: list[int]) → ArrayLike1D[source]¶

Returns the similarity scores between the nodes specified as parameter. Uses MinHash fuzzy hash at basic block level to give a similarity score.

Parameters:

primary_idx – List of node indexes inside the primary
secondary_idx – List of node indexes inside the secondary

Returns:

A sequence with the corresponding similarities of the given nodes

match_import_functions(sim_matrix: SimMatrix, primary: Program, secondary: Program, primary_mapping: dict[Addr, Idx], secondary_mapping: dict[Addr, Idx]) → None[source]¶

Anchoring phase. This phase considers import functions as anchors to the matching and set these functions similarity to 1. This anchoring phase is necessary to obtain a good match.

Parameters:

sim_matrix – The similarity matrix of between the primary and secondary, of type qbindiff.types:SimMatrix
primary – The primary binary of type qbindiff.loader.Program
secondary – The secondary binary of type qbindiff.loader.Program
primary_mapping – Mapping between the primary function addresses and their corresponding index
secondary_mapping – Mapping between the secondary function addresses and their corresponding index

normalize(program: Program) → Program[source]¶

Normalize the input Program. In some cases, this can create an exception, caused by a thunk function.

Parameters:: program – the program of type qbindiff.loader.Program to normalize.
Returns:: the normalized program

register_feature_extractor(extractor_class: type[FeatureExtractor], weight: float | None = 1.0, distance: Distance | None = None, **extra_args) → None[source]¶

Register a feature extractor class. This will include the corresponding feature in the similarity matrix computation

Parameters:

extractor_class – A feature extractor of type qbindiff.features.extractor
weight – Weight associated to the corresponding feature. Default is 1.
distance – Distance used only for this feature. It does not make sense to use it with bnb feature, but it can be useful for the WeisfeilerLehman feature.

register_postpass(pass_func: GenericPostPass, **extra_args) → None[source]¶

Register a new post-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one.

Parameters:: pass_func – Pass method to apply on the similarity matrix. Example: a Pass that enforces the matches considering certain features extracted.

DiGraphDiffer¶

class qbindiff.DiGraphDiffer(primary: DiGraph, secondary: DiGraph, **kwargs)[source]¶

Bases: Differ

Differ implementation for two generic networkx.DiGraph

Abstract class that perform the NAP diffing between two generic graphs.

Parameters:

primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type qbindiff.types.Positive. The greatest the fastest, but least accurate
maxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row

gen_sim_matrix(sim_matrix: SimMatrix, *args, **kwargs) → None[source]¶

Initialize the similarity matrix

Parameters:: sim_matrix – The similarity matrix of type qbindiff.types.SimMatrix
Returns:: None