Differs¶
GenericGraph¶
- class qbindiff.GenericGraph[source]¶
Bases:
object
Abstract class representing a generic graph
- abstract property edges: Iterable[tuple[NodeLabel, NodeLabel]]¶
Iterate over the edges. An edge is a pair (node_label_a, node_label_b)
- Returns:
An
Iterable
over the edges.
- abstract get_node(node_label: NodeLabel) GenericNode [source]¶
Get the node identified by the node_label
- Parameters:
node_label – the unique identifier of the node
- Returns:
The node identified by the label
- abstract items() Iterable[tuple[NodeLabel, GenericNode]] [source]¶
Iterate over the items. Each item is {node_label: node}
- Returns:
A
Iterable
over the items. Each item is a tuple (node_label, node)
- abstract property node_labels: Iterable[NodeLabel]¶
Iterate over the node labels
- Returns:
An
Iterable
over the node labels
- abstract property nodes: Iterable[GenericNode]¶
Iterate over the nodes themselves
- Returns:
An
Iterable
over the nodes
GenericNode¶
Differ¶
- class qbindiff.Differ(primary: GenericGraph, secondary: GenericGraph, *, sparsity_ratio: float = 0.6, tradeoff: float = 0.8, epsilon: float = 0.9, maxiter: int = 1000, sparse_row: bool = False)[source]¶
Bases:
object
Abstract class that perform the NAP diffing between two generic graphs.
- Parameters:
primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type
qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type
qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type
qbindiff.types.Positive
. The greatest the fastest, but least accuratemaxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row
- DTYPE¶
alias of
float32
- compute_matching() Mapping | None [source]¶
Run the belief propagation algorithm. This method hangs until the computation is done. The resulting matching is returned as a Mapping object.
- Returns:
Mapping between items of the primary and items of the secondary
- extract_adjacency_matrix(graph: Graph) tuple[AdjacencyMatrix, dict[Idx, NodeLabel], dict[NodeLabel, Idx]] [source]¶
Returns the adjacency matrix for the graph and the mappings
- Parameters:
graph – Graph whose adjacency matrix should be extracted
- Returns:
A tuple containing in this order: the adjacency matrix of the graph, the map between index to label, the map between label to index.
- get_similarities(primary_idx: list[Idx], secondary_idx: list[Idx]) ArrayLike1D [source]¶
Returns the similarity scores between the nodes specified as parameter. By default, it uses the similarity matrix. This method is meant to be overridden by subclasses to give more meaningful scores
- Parameters:
primary_idx – the List of integers that represent nodes inside the primary graph
secondary_idx – the List of integers that represent nodes inside the primary graph
- Returns:
A sequence with the corresponding similarities of the given nodes
- matching_iterator() Generator[int, None, None] [source]¶
Run the belief propagation algorithm.
- Returns:
A generator the yields the iteration number until the algorithm either converges or reaches
self.maxiter
- primary¶
Primary graph
- process_iterator() Iterator[int] [source]¶
Initialize all the variables for the NAP algorithm in an iterative way. It returns an iterator that can be used for tracking the progress.
- Returns:
An iterator of values in the range [0, 1000] used for tracking progress. It might contain more than 1000 elements.
- register_prepass(pass_func: GenericPrePass, **extra_args) None [source]¶
Register a new pre-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one. .. warning:: A prepass should assign values to the full row or the full column, it should never assign single entries in the matrix
- Parameters:
pass_func – Pass method to apply on the similarity matrix. Example : a Pass that first matches import functions.
- secondary¶
Secondary graph
QBinDiff¶
- class qbindiff.QBinDiff(primary: Program, secondary: Program, distance: Distance = Distance.haussmann, normalize: bool = False, **kwargs)[source]¶
Bases:
Differ
QBinDiff class that provides a high-level interface to trigger a diff between two binaries.
- Parameters:
primary – The primary binary of type
qbindiff.loader.Program
secondary – The secondary binary of type
qbindiff.loader.Program
distance – the distance function used when comparing the feature vector extracted from the graphs.
normalize – Normalize the two programs Call Graphs with a series of heuristics. Look at
normalize()
for more information.
- DTYPE¶
alias of
float32
- export_to_bindiff(filename: str) None [source]¶
Exports diffing results inside the BinDiff format
- Parameters:
filename – Name of the output diffing file
- get_similarities(primary_idx: list[int], secondary_idx: list[int]) ArrayLike1D [source]¶
Returns the similarity scores between the nodes specified as parameter. Uses MinHash fuzzy hash at basic block level to give a similarity score.
- Parameters:
primary_idx – List of node indexes inside the primary
secondary_idx – List of node indexes inside the secondary
- Returns:
A sequence with the corresponding similarities of the given nodes
- match_import_functions(sim_matrix: SimMatrix, primary: Program, secondary: Program, primary_mapping: dict[Addr, Idx], secondary_mapping: dict[Addr, Idx]) None [source]¶
Anchoring phase. This phase considers import functions as anchors to the matching and set these functions similarity to 1. This anchoring phase is necessary to obtain a good match.
- Parameters:
sim_matrix – The similarity matrix of between the primary and secondary, of type
qbindiff.types:SimMatrix
primary – The primary binary of type
qbindiff.loader.Program
secondary – The secondary binary of type
qbindiff.loader.Program
primary_mapping – Mapping between the primary function addresses and their corresponding index
secondary_mapping – Mapping between the secondary function addresses and their corresponding index
- normalize(program: Program) Program [source]¶
Normalize the input Program. In some cases, this can create an exception, caused by a thunk function.
- Parameters:
program – the program of type
qbindiff.loader.Program
to normalize.- Returns:
the normalized program
- register_feature_extractor(extractor_class: type[FeatureExtractor], weight: float | None = 1.0, distance: Distance | None = None, **extra_args) None [source]¶
Register a feature extractor class. This will include the corresponding feature in the similarity matrix computation
- Parameters:
extractor_class – A feature extractor of type
qbindiff.features.extractor
weight – Weight associated to the corresponding feature. Default is 1.
distance – Distance used only for this feature. It does not make sense to use it with bnb feature, but it can be useful for the WeisfeilerLehman feature.
- register_postpass(pass_func: GenericPostPass, **extra_args) None [source]¶
Register a new post-pass that will operate on the similarity matrix. The passes will be called in the same order as they are registered and each one of them will operate on the output of the previous one.
- Parameters:
pass_func – Pass method to apply on the similarity matrix. Example: a Pass that enforces the matches considering certain features extracted.
DiGraphDiffer¶
- class qbindiff.DiGraphDiffer(primary: DiGraph, secondary: DiGraph, **kwargs)[source]¶
Bases:
Differ
Differ implementation for two generic networkx.DiGraph
Abstract class that perform the NAP diffing between two generic graphs.
- Parameters:
primary – primary graph
secondary – secondary graph
sparsity_ratio – the sparsity ratio enforced to the similarity matrix of type
qbindiff.types.Ratio
tradeoff – tradeoff ratio bewteen node similarity (tradeoff=1.0) and edge similarity (tradeoff=0.0) of type
qbindiff.types.Ratio
epsilon – perturbation parameter to enforce convergence and speed up computation, of type
qbindiff.types.Positive
. The greatest the fastest, but least accuratemaxiter – maximum number of message passing iterations
sparse_row – Whether to build the sparse similarity matrix considering its entirety or processing it row per row
- gen_sim_matrix(sim_matrix: SimMatrix, *args, **kwargs) None [source]¶
Initialize the similarity matrix
- Parameters:
sim_matrix – The similarity matrix of type
qbindiff.types.SimMatrix
- Returns:
None