Differs

GenericGraph

class qbindiff.GenericGraph[source]

Bases: object

Abstract class representing a generic graph

abstract property edges: Iterator[tuple[Any, Any]]

Return an iterator over the edges. An edge is a pair (node_label_a, node_label_b)

abstract get_node(node_label: Any)[source]

Returns the node identified by the node_label

abstract items() Iterator[tuple[Any, Any]][source]

Return an iterator over the items. Each item is {node_label: node}

abstract property node_labels: Iterator[Any]

Return an iterator over the node labels

abstract property nodes: Iterator[Any]

Return an iterator over the nodes

Differ

class qbindiff.Differ(primary: GenericGraph, secondary: GenericGraph, *, sparsity_ratio: float = 0.75, tradeoff: float = 0.75, epsilon: float = 0.5, maxiter: int = 1000, normalize: bool = False, sparse_row: bool = False)[source]

Bases: object

Abstract class that perform the NAP diffing between two generic graphs.

Parameters:
  • primary – primary graph

  • secondary – secondary graph

  • sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type py:class:qbindiff.types.Ratio

  • tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type py:class:qbindiff.types.Ratio

  • epsilon – perturbation parameter to enforce convergence and speed up computation, of type py:class:qbindiff.types.Positive. The greatest the fastest, but least accurate

  • maxiter – maximum number of message passing iterations

  • sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row

DTYPE

alias of float32

compute_matching() Mapping[source]

Run the belief propagation algorithm. This method hangs until the computation is done. The resulting matching is returned as a Mapping object.

Returns:

Mapping between items of the primary and items of the secondary

extract_adjacency_matrix(graph: GenericGraph) tuple[ndarray, dict[int, int], dict[int, int]][source]

Returns the adjacency matrix for the graph and the mappings

Parameters:

graph – Graph whose adjacency matrix should be extracted

Returns:

A tuple containing in this order: the adjacency matrix of the graph, the map between index to label, the map between label to index.

get_similarities(primary_idx: list[int], secondary_idx: list[int]) list[float][source]

Returns the similarity scores between the nodes specified as parameter. By default, it uses the similarity matrix. This method is meant to be overridden by subclasses to give more meaningful scores

Parameters:
  • primary_idx – the List of integers that represent nodes inside the primary graph

  • secondary_idx – the List of integers that represent nodes inside the primary graph

Return sim_matrix:

the similarity matrix between the specified nodes

normalize(graph: GenericGraph) GenericGraph[source]

Custom function that normalizes the input graph. This method is meant to be overriden by a sub-class.

Parameters:

graph – graph to normalize

Return graph:

normalized graph

primary

Primary graph

process() None[source]

Initialize all the variables for the NAP algorithm.

register_postpass(pass_func: GenericPass, **extra_args) None[source]

Register a new post-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one.

Parameters:

pass_func – Pass method to apply on the similarity matrix. Example : a Pass that extracts graph features.

register_prepass(pass_func: GenericPass, **extra_args) None[source]

Register a new pre-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one. .. warning:: A prepass should assign values to the full row or the full column, it should never assign single entries in the matrix

Parameters:

pass_func – Pass method to apply on the similarity matrix. Example : a Pass that first matches import functions.

run_passes() None[source]

Run all the passes that have been previously registered.

secondary

Secondary graph

QBinDiff

class qbindiff.QBinDiff(primary: Program, secondary: Program, distance: Distance = Distance.canberra, **kwargs)[source]

Bases: Differ

QBinDiff class that provides a high-level interface to trigger a diff between two binaries.

Parameters:
  • primary – The primary binary of type py:class:qbindiff.loader.Program

  • secondary – The secondary binary of type py:class:qbindiff.loader.Program

  • distance – the distance function used when comparing the feature vector extracted from the graphs. Default is a py:class:qbindiff.types.Distance initialized to ‘canberra’.

DTYPE

alias of float32

export_to_bindiff(filename: str) None[source]

Exports diffing results inside the BinDiff format

Parameters:

filename – Name of the output diffing file

Returns:

None

get_similarities(primary_idx: list[int], secondary_idx: list[int]) list[float][source]

Returns the similarity scores between the nodes specified as parameter. Uses MinHash fuzzy hash at basic block level to give a similarity score.

Parameters:
  • primary_idx – List of node indexes inside the primary

  • secondary_idx – List of node indexes inside the secondary

Returns:

The list of corresponding similarities between the given nodes

match_import_functions(sim_matrix: ndarray, primary: Program, secondary: Program, primary_mapping: dict, secondary_mapping: dict) None[source]

Anchoring phase. This phase considers import functions as anchors to the matching and set these functions similarity to 1. This anchoring phase is necessary to obtain a good match.

Parameters:
  • sim_matrix – The similarity matrix of between the primary and secondary, of type py:class:qbindiff.types:SimMatrix

  • primary – The primary binary of type py:class:qbindiff.loader.Program

  • secondary – The secondary binary of type py:class:qbindiff.loader.Program

  • primary_mapping – Mapping between the primary function addresses and their corresponding index

  • secondary_mapping – Mapping between the secondary function addresses and their corresponding index

Returns:

None

normalize(program: Program) Program[source]

Normalize the input Program. In some cases, this can create an exception, caused by a thunk function.

:param program : the program of type py:class:qbindiff.loader.Program to normalize. :return program : the normalized program

register_feature_extractor(extractor_class: type[FeatureExtractor], weight: float | None = 1.0, distance: Distance | None = None, **extra_args) None[source]

Register a feature extractor class. This will include the corresponding feature in the similarity matrix computation

Parameters:
  • extractor_class – A feature extractor of type py:class:qbindiff.features.extractor

  • weight – Weight associated to the corresponding feature. Default is 1.

  • distance – Distance used only for this feature. It does not make sense to use it with bnb feature, but it can be useful for the WeisfeilerLehman feature.

DiGraphDiffer

class qbindiff.DiGraphDiffer(primary: DiGraph, secondary: DiGraph, **kwargs)[source]

Bases: Differ

Differ implementation for two generic networkx.DiGraph

Abstract class that perform the NAP diffing between two generic graphs.

Parameters:
  • primary – primary graph

  • secondary – secondary graph

  • sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type py:class:qbindiff.types.Ratio

  • tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type py:class:qbindiff.types.Ratio

  • epsilon – perturbation parameter to enforce convergence and speed up computation, of type py:class:qbindiff.types.Positive. The greatest the fastest, but least accurate

  • maxiter – maximum number of message passing iterations

  • sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row

gen_sim_matrix(sim_matrix: ndarray, *args, **kwargs) None[source]

Initialize the similarity matrix

Parameters:

sim_matrix – The similarity matrix of type py:class:qbindiff.types.SimMatrix

Returns:

None