Differs¶
GenericGraph¶
- class qbindiff.GenericGraph[source]¶
Bases:
object
Abstract class representing a generic graph
- abstract property edges: Iterator[tuple[Any, Any]]¶
Return an iterator over the edges. An edge is a pair (node_label_a, node_label_b)
Differ¶
- class qbindiff.Differ(primary: GenericGraph, secondary: GenericGraph, *, sparsity_ratio: float = 0.75, tradeoff: float = 0.75, epsilon: float = 0.5, maxiter: int = 1000, normalize: bool = False, sparse_row: bool = False)[source]¶
Bases:
object
Abstract class that perform the NAP diffing between two generic graphs.
- Parameters:
primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type py:class:qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type py:class:qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type py:class:qbindiff.types.Positive. The greatest the fastest, but least accurate
maxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row
- DTYPE¶
alias of
float32
- compute_matching() Mapping [source]¶
Run the belief propagation algorithm. This method hangs until the computation is done. The resulting matching is returned as a Mapping object.
- Returns:
Mapping between items of the primary and items of the secondary
- extract_adjacency_matrix(graph: GenericGraph) tuple[ndarray, dict[int, int], dict[int, int]] [source]¶
Returns the adjacency matrix for the graph and the mappings
- Parameters:
graph – Graph whose adjacency matrix should be extracted
- Returns:
A tuple containing in this order: the adjacency matrix of the graph, the map between index to label, the map between label to index.
- get_similarities(primary_idx: list[int], secondary_idx: list[int]) list[float] [source]¶
Returns the similarity scores between the nodes specified as parameter. By default, it uses the similarity matrix. This method is meant to be overridden by subclasses to give more meaningful scores
- Parameters:
primary_idx – the List of integers that represent nodes inside the primary graph
secondary_idx – the List of integers that represent nodes inside the primary graph
- Return sim_matrix:
the similarity matrix between the specified nodes
- normalize(graph: GenericGraph) GenericGraph [source]¶
Custom function that normalizes the input graph. This method is meant to be overriden by a sub-class.
- Parameters:
graph – graph to normalize
- Return graph:
normalized graph
- primary¶
Primary graph
- register_postpass(pass_func: GenericPass, **extra_args) None [source]¶
Register a new post-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one.
- Parameters:
pass_func – Pass method to apply on the similarity matrix. Example : a Pass that extracts graph features.
- register_prepass(pass_func: GenericPass, **extra_args) None [source]¶
Register a new pre-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one. .. warning:: A prepass should assign values to the full row or the full column, it should never assign single entries in the matrix
- Parameters:
pass_func – Pass method to apply on the similarity matrix. Example : a Pass that first matches import functions.
- secondary¶
Secondary graph
QBinDiff¶
- class qbindiff.QBinDiff(primary: Program, secondary: Program, distance: Distance = Distance.canberra, **kwargs)[source]¶
Bases:
Differ
QBinDiff class that provides a high-level interface to trigger a diff between two binaries.
- Parameters:
primary – The primary binary of type py:class:qbindiff.loader.Program
secondary – The secondary binary of type py:class:qbindiff.loader.Program
distance – the distance function used when comparing the feature vector extracted from the graphs. Default is a py:class:qbindiff.types.Distance initialized to ‘canberra’.
- DTYPE¶
alias of
float32
- export_to_bindiff(filename: str) None [source]¶
Exports diffing results inside the BinDiff format
- Parameters:
filename – Name of the output diffing file
- Returns:
None
- get_similarities(primary_idx: list[int], secondary_idx: list[int]) list[float] [source]¶
Returns the similarity scores between the nodes specified as parameter. Uses MinHash fuzzy hash at basic block level to give a similarity score.
- Parameters:
primary_idx – List of node indexes inside the primary
secondary_idx – List of node indexes inside the secondary
- Returns:
The list of corresponding similarities between the given nodes
- match_import_functions(sim_matrix: ndarray, primary: Program, secondary: Program, primary_mapping: dict, secondary_mapping: dict) None [source]¶
Anchoring phase. This phase considers import functions as anchors to the matching and set these functions similarity to 1. This anchoring phase is necessary to obtain a good match.
- Parameters:
sim_matrix – The similarity matrix of between the primary and secondary, of type py:class:qbindiff.types:SimMatrix
primary – The primary binary of type py:class:qbindiff.loader.Program
secondary – The secondary binary of type py:class:qbindiff.loader.Program
primary_mapping – Mapping between the primary function addresses and their corresponding index
secondary_mapping – Mapping between the secondary function addresses and their corresponding index
- Returns:
None
- normalize(program: Program) Program [source]¶
Normalize the input Program. In some cases, this can create an exception, caused by a thunk function.
:param program : the program of type py:class:qbindiff.loader.Program to normalize. :return program : the normalized program
- register_feature_extractor(extractor_class: type[FeatureExtractor], weight: float | None = 1.0, distance: Distance | None = None, **extra_args) None [source]¶
Register a feature extractor class. This will include the corresponding feature in the similarity matrix computation
- Parameters:
extractor_class – A feature extractor of type py:class:qbindiff.features.extractor
weight – Weight associated to the corresponding feature. Default is 1.
distance – Distance used only for this feature. It does not make sense to use it with bnb feature, but it can be useful for the WeisfeilerLehman feature.
DiGraphDiffer¶
- class qbindiff.DiGraphDiffer(primary: DiGraph, secondary: DiGraph, **kwargs)[source]¶
Bases:
Differ
Differ implementation for two generic networkx.DiGraph
Abstract class that perform the NAP diffing between two generic graphs.
- Parameters:
primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type py:class:qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type py:class:qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type py:class:qbindiff.types.Positive. The greatest the fastest, but least accurate
maxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row