Academic Publications

Warning

This table is a republication of the great Awesome-Binary-Similarity work by SystemSecurityStorm. Please contribute to this repo !

Awesome Binary Similarity

Title

Venue

Year

Paper

Slide

Video

Github

BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching

ICSE

2024

link

Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection

Usenix

2024

link

link

CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision

ISSTA

2024

link

link

CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection

ISSTA

2024

link

link

FASER: Binary Code Similarity Search through the use of Intermediate Representations

CAMLIS

2023

link

link

link

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

2023

link

kTrans: Knowledge-Aware Transformer for Binary Code Embedding

2023

link

link

Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis

ISSTA

2023

link

link

Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge

TOSEM

2023

link

link

sem2vec: Semantics-aware Assembly Tracelet Embedding

TOSEM

2023

link

link

1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

TOSEM

2023

link

Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures

AsiaCCS

2023

Link

VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search

NDSS

2023

link

link

A Game-Based Framework to Compare Program Classifiers and Evaders

CGO

2023

link

link

link

link

BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features

MDPI

2023

link

A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features

CSUR

2022

link

Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning

ACSAC

2022

link

link

link

Improving cross-platform binary analysis using representation learning via graph alignment

ISSTA

2022

link

link

link

jTrans: Jump-Aware Transformer for Binary Code Similarity

ISSTA

2022

link

link

link

COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks

DIMVA

2022

link

A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware

ISSTA

2022

link

link

link

How Machine Learning Is Solving the Binary Function Similarity Problem

Usenix

2022

link

link

link

Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking

TSE

2022

link

link

Program Representations for Predictive Compilation: State of Affairs in the Early 20’s

COLA

2022

link

link

link

Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study

JCVHT

2022

link

PalmTree: Learning an Assembly Language Model for Instruction Embedding

CCS

2021

link

link

link

Binary code similarity detection

ASE

2021

link

Binary diffing as a network alignment problem via belief propagation

ASE

2021

link

Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection

IEEE DSN 2021

2021

link

link

BinDeep: A deep learning approach to binary code similarity detection

ESWA

2021

link

EnBinDiff: Identifying Data-Only Patches for Binaries

TDSC

2021

link

BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences

TSE

2021

link

link

Codee: A Tensor Embedding Scheme for Binary Code Search

TSE

2021

link

link

Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned

TSE(revision)

2021

link

link

How could Neural Networks understand Programs?

ICML 2021

2021

link

link

Multi-threshold token-based code clone detection

SANER 2021

2021

link

FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings

IEEE Euro S&P 2021

2021

link

link

link

TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity

2020

link

link

Similarity of Binaries Across Optimization Levels and Obfuscation

ESORICS 2020

2020

link

link

Open-source tools and benchmarks for code-clone detection: past, present, and future trends

2020

link

Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence

2020

LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code

2020

link

Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree

SANER

2020

link

What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning

2020

link

Clone Detection on Large Scala Codebases

2020

link

CloneCompass: Visualizations for Code Clone Analysis

2020

link

DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing

NDSS

2020

link

link

link

VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets

EuroS&P

2020

link

Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection

AAAI

2020

link

Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture

NDSS

2020

link

link

Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis

NDSS Workshop on Binary Analysis Research (BAR)

2019

link

link

Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization

IEEE S&P

2019

link

link

link

Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things

MDPI

2019

link

A Survey of Binary Code Similarity

CSUR

2019

link

代码克隆检测研究进展

软件学报

2019

link

A Systematic Review on Code Clone Detection

2019

link

A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis

NDSS

2019

link

link

Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs

NDSS

2019

link

link

link

model

SAFE: Self-Attentive Function Embeddings for Binary Similarity

2019

link

link

link

Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection

SANER

2019

link

基于深度学习的跨平台二进制代码关联分析

2019

link

CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph

2019

link

Function matching between binary executables: efficient algorithms and features

JCVHT

2019

link

BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis

ICSME

2018

link

αDiff: Cross-Version Binary Code Similarity Detection with DNN

ASE

2018

link

dataset

Binary Similarity Detection Using Machine Learning

PLDI

2018

link

CCAligner: A Token Based Large-Gap Clone Detector

ICSE

2018

link

Oreo: Detection of Clones in the Twilight Zone

FSE

2018

link

VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary

ASE

2018

link

link

VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation

2018

link

FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware

2018

link

BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices

2018

link

A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries

2018

link

Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis

2018

link

link

BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering

ASIA CCS

2018

link

A Deep Learning Approach to Program Similarity

MASES

2018

link

Recurrent Neural Network for Code Clone Detection

SEIM

2018

link

The Adverse Effects of Code Duplication in Machine Learning Models of Code

2018

link

link

Benchmarks for software clone detection: A ten-year retrospective

SANER

2018

link

Binary Code Clone Detection across Architectures and Compiling Configurations

ICPC

2017

link

Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection

ACM CCS

2017

link

link

BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection

ASIA CCS

2017

link

BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape

DIMVA

2017

link

Compiler-agnostic function detection in binaries

IEEE EuroS&P

2017

link

link

BinSign: Fingerprinting binary functions to support automated analysis of code executables

2017

link

Similarity of binaries through re-optimization

PLDI

2017

link

link

Transferring code-clone detection and analysis to practice

ICSE-SEIP

2017

link

Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping

IEEE S&P

2017

link

Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code

IJCAI

2017

link

Extracting Conditional Formulas for Cross-Platform Bug Search

ASIA CCS

2017

link

SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills

ICSE

2017

link

CCLearner: A Deep Learning-Based Clone Detection Approach

2017

link

link

BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking

USENIX

2017

link

link

link

In-memory Fuzzing for Binary Code Similarity Analysis

ASE

2017

link

DéjàVu: a map of code duplicates on GitHub

OOPSLA

2017

link

Some from Here, Some from There: Cross-project Code Reuse in GitHub

MSR

2017

link

CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph

2017

link

Identifying Functionally Similar Code in Complex Codebases

ICPC

2016

link

link

Scalable graph-based bug search for firmware images (Genius)

ASM CCS

2016

link

link

link

Cross-Architecture Binary Semantics Understanding via Similar Code Comparison

IEEE SANER

2016

link

discovRE: Efficient cross-architecture identification of bugs in binary code

NDSS

2016

link

BinGo: Cross-architecture cross-OS Binary Search

FSE

2016

link

Kam1n0: Mapreduce-based assembly clone search for reverse engineering

KDD

2016

link

link

Statistical similarity of binaries

PLDI

2016

link

link

link

Deep learning code fragments for code clone detection

ASE

2016

link

A Survey of Software Clone Detection Techniques

2016

link

SourcererCC: Scaling Code Clone Detection to Big Code

ICSE

2016

link

Binary executable file similarity calculation using function matching

2016

link

Matching Similar Functions in Different Versions of a Malware

2016

link

BinDNN: Resilient Function Matching Using Deep Learning

2016

link

VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis

ACSAC

2016

link

link

BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench

2016

link

link

Cross-architecture bug search in binary executables

IEEE S&P

2015

link

Library functions identification in binary code by using graph isomorphism testings

2015

link

Evaluating clone detection tools with BigCloneBench

2015

link

link

Memoized semantics-based binary diffing with application to malware lineage inference

2015

link

Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code

2015

link

link

BYTEWEIGHT: Learning to Recognize Functions in Binary Code

USENIX

2014

link

link

link

Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection

FSE

2014

link

Binclone: Detecting code clones in malware

SERE

2014

link

link

Detecting fine-grained similarity in binaries

2014

link

Leveraging semantic signatures for bug search in binary programs

ACSAC

2014

link

How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors

2014

link

Tracelet-based code search in executables

PLDI

2014

link

Control Flow-Based Malware Variant Detection

2014

link

Hashing for Similarity Search: A Survey

2014

link

Achieving accuracy and scalability simultaneously in detecting application clones on android markets

ICSE

2014

link

Identifying Shared Software Components to Support Malware Forensics

2014

link

Evaluating Modern Clone Detection Tools

2014

link

Rendezvous: a search engine for binary code

MSR

2013

link

Binslayer: accurate comparison of binary executables

PPREW

2013

link

link

Software clone detection: A systematic review

2013

link

How to extract differences from similar programs? A cohesion metric approach

2013

link

Software clone detection and refactoring

2013

link

An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code

2013

link

A hybrid-token and textual based approach to find similar code segments

2013

link

Gapped code clone detection with lightweight source code analysis

2013

link

MutantX-S: Scalable Malware Clustering Based on Static Features

USENIX

2013

link

link

Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice

PPREW

2013

link

Towards Automatic Software Lineage Inference

USENIX

2013

link

link

AnDarwin: Scalable Detection of Semantically Similar Android Applications

2013

link

Expose: Discovering potential binary code re-use

2013

link

Function Matching-based Binary level Software Similarity Calculation

RACS

2013

link

FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors

RAID

2013

link

A study of repetitiveness of code changes in software evolution

ASE

2013

link

ibinhunt: Binary hunting with interprocedural control flow

2012

link

link

ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions

USENIX

2012

link

Boreas: an accurate and scalable token-based approach to code clone detection

ASE

2012

link

Folding Repeated Instructions for Improving Token-Based Code Clone Detection

2012

link

A metrics-based data mining approach for software clone detection

2012

link

Comparison of Clone Detection Techniques

2012

Malware Classification Method via Binary Content Comparison

RACS

2012

link

Binary function clustering using semantic hashes

ICMLA

2012

link

Value-based program characterization and its application to software plagiarism detection

2011

link

CMCD: Count Matrix Based Code Clone Detection

2011

link

Incremental code clone detection: A pdg-based approach

2011

link

Anywhere, Any-Time Binary Instrumentation

2011

link

Code reuse in open source software development: Quantitative evidence, drivers, and impediments

2010

Index-based code clone detection: incremental, distributed, scalable

2010

Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics

2010

Ghezzi, A hybrid approach (syntactic and textual) to clone detection

2010

Evaluating code clone genealogies at release level: An empirical study

2010

A survey of Binary similarity and distance measures

2010

Idea: Opcode-Sequence-Based Malware Detection

2010

Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces

USENIX

2010

Data fingerprinting with similarity digests

2010

Automatic mining of functionally equivalent code fragments via random testing

2009

A mutation/injection-based automatic framework for evaluating code clone detection tools

2009

Problematic code clones identification using multiple detection results

2009

Incremental clone detection

2009

Scalable and incremental clone detection for evolving software

2009

Large-scale Malware Indexing Using Function-call Graphs

2009

Scalable, Behavior-Based Malware Clustering

2009

peHash: A Novel Approach to Fast Malware Clustering

USENIX

2009

Detecting Code Clones in Binary Executables

2009

Binhunt: Automatically finding semantic differences in binary programs

2008

Scalable detection of semantic clones

2008

Deckard: Scalable and accurate tree-based detection of code clones

2007

Large-scale code reuse in open source software

2007

A survey on software clone detection research

2007

link

A study of consistent and inconsistent changes to code clones

2007

Comparison and evaluation of clone detection tools

2007

Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions

2007

A Static Birthmark of Binary Executables Based on API Call Structure

2007

CP-Miner: Finding copy-paste and related bugs in large-scale software code

2006

Survey of research on software clones

2006

link

“Cloning considered harmful” considered harmful: patterns of cloning in software

2006

link

GPLAG: detection of software plagiarism by program dependence graph analysis

2006

Detecting Self-mutating Malware Using Control-flow Graph Matching

2006

Identifying Almost Identical Files Using Context Triggered Piecewise Hashing

2006

Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience

IEEE S&P

2006

Graph-based comparison of executable objects

2005

SDD: high performance code clone detection system for large scale source code

2005

link

Polygraph: Automatically generating signatures for polymorphic worms

2005

K-gram Based Software Birthmarks

2005

Insights into System-Wide Code Duplication

IEEE

2004

link

Clone detection in source code by frequent itemset techniques

2004

Evaluating clone detection techniques from a refactoring perspective

2004

Structural comparison of executable objects

2004

Code compaction of matching single-entry multiple-exit regions

2003

link

CloSpan: Mining: Closed sequential patterns in large datasets

2003

Ccfinder: a multilinguistic token-based code clone detection system for large scale source code

2002

Identifying similar code with program dependence graphs

2001

Using slicing to identify duplication in source code

2001

BMAT – A Binary Matching Tool for Stale Profile Propagation

2000

A language independent approach for detecting duplicated code

1999

Compressing Differences of Executable Code

1999

Similarity search in high dimensions via hashing

1999

Clone detection using abstract syntax trees

1998

Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

1996

Pattern matching for clone and concept detection

1996

On finding duplication and near-duplication in large software systems

1995

link

Detecting code similarity using patterns

1995

A Cross-platform Binary Diff

1995