Academic Publications¶
Warning
This table is a republication of the great Awesome-Binary-Similarity work by SystemSecurityStorm. Please contribute to this repo !
Awesome Binary Similarity¶
Title |
Venue |
Year |
Paper |
Slide |
Video |
Github |
---|---|---|---|---|---|---|
BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching |
ICSE |
2024 |
||||
Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection |
Usenix |
2024 |
||||
CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision |
ISSTA |
2024 |
||||
CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection |
ISSTA |
2024 |
||||
FASER: Binary Code Similarity Search through the use of Intermediate Representations |
CAMLIS |
2023 |
||||
VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity |
2023 |
|||||
kTrans: Knowledge-Aware Transformer for Binary Code Embedding |
2023 |
|||||
Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis |
ISSTA |
2023 |
||||
Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge |
TOSEM |
2023 |
||||
sem2vec: Semantics-aware Assembly Tracelet Embedding |
TOSEM |
2023 |
||||
1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis |
TOSEM |
2023 |
||||
Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures |
AsiaCCS |
2023 |
||||
VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search |
NDSS |
2023 |
||||
A Game-Based Framework to Compare Program Classifiers and Evaders |
CGO |
2023 |
||||
BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features |
MDPI |
2023 |
||||
A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features |
CSUR |
2022 |
||||
Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning |
ACSAC |
2022 |
||||
Improving cross-platform binary analysis using representation learning via graph alignment |
ISSTA |
2022 |
||||
jTrans: Jump-Aware Transformer for Binary Code Similarity |
ISSTA |
2022 |
||||
COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks |
DIMVA |
2022 |
||||
A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware |
ISSTA |
2022 |
||||
How Machine Learning Is Solving the Binary Function Similarity Problem |
Usenix |
2022 |
||||
Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking |
TSE |
2022 |
||||
Program Representations for Predictive Compilation: State of Affairs in the Early 20’s |
COLA |
2022 |
||||
Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study |
JCVHT |
2022 |
||||
PalmTree: Learning an Assembly Language Model for Instruction Embedding |
CCS |
2021 |
||||
Binary code similarity detection |
ASE |
2021 |
||||
Binary diffing as a network alignment problem via belief propagation |
ASE |
2021 |
||||
Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection |
IEEE DSN 2021 |
2021 |
||||
BinDeep: A deep learning approach to binary code similarity detection |
ESWA |
2021 |
||||
EnBinDiff: Identifying Data-Only Patches for Binaries |
TDSC |
2021 |
||||
BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences |
TSE |
2021 |
||||
Codee: A Tensor Embedding Scheme for Binary Code Search |
TSE |
2021 |
||||
Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned |
TSE(revision) |
2021 |
||||
How could Neural Networks understand Programs? |
ICML 2021 |
2021 |
||||
Multi-threshold token-based code clone detection |
SANER 2021 |
2021 |
||||
FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings |
IEEE Euro S&P 2021 |
2021 |
||||
TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity |
2020 |
|||||
Similarity of Binaries Across Optimization Levels and Obfuscation |
ESORICS 2020 |
2020 |
||||
Open-source tools and benchmarks for code-clone detection: past, present, and future trends |
2020 |
|||||
Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence |
2020 |
|||||
LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code |
2020 |
|||||
Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree |
SANER |
2020 |
||||
What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning |
2020 |
|||||
Clone Detection on Large Scala Codebases |
2020 |
|||||
CloneCompass: Visualizations for Code Clone Analysis |
2020 |
|||||
DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing |
NDSS |
2020 |
||||
VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |
EuroS&P |
2020 |
||||
Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection |
AAAI |
2020 |
||||
Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture |
NDSS |
2020 |
||||
Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis |
NDSS Workshop on Binary Analysis Research (BAR) |
2019 |
||||
Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization |
IEEE S&P |
2019 |
||||
Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things |
MDPI |
2019 |
||||
A Survey of Binary Code Similarity |
CSUR |
2019 |
||||
代码克隆检测研究进展 |
软件学报 |
2019 |
||||
A Systematic Review on Code Clone Detection |
2019 |
|||||
A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis |
NDSS |
2019 |
||||
Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs |
NDSS |
2019 |
||||
SAFE: Self-Attentive Function Embeddings for Binary Similarity |
2019 |
|||||
Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection |
SANER |
2019 |
||||
基于深度学习的跨平台二进制代码关联分析 |
2019 |
|||||
CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph |
2019 |
|||||
Function matching between binary executables: efficient algorithms and features |
JCVHT |
2019 |
||||
BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis |
ICSME |
2018 |
||||
αDiff: Cross-Version Binary Code Similarity Detection with DNN |
ASE |
2018 |
||||
Binary Similarity Detection Using Machine Learning |
PLDI |
2018 |
||||
CCAligner: A Token Based Large-Gap Clone Detector |
ICSE |
2018 |
||||
Oreo: Detection of Clones in the Twilight Zone |
FSE |
2018 |
||||
VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary |
ASE |
2018 |
||||
VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation |
2018 |
|||||
FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware |
2018 |
|||||
BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices |
2018 |
|||||
A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries |
2018 |
|||||
Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis |
2018 |
|||||
BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering |
ASIA CCS |
2018 |
||||
A Deep Learning Approach to Program Similarity |
MASES |
2018 |
||||
Recurrent Neural Network for Code Clone Detection |
SEIM |
2018 |
||||
The Adverse Effects of Code Duplication in Machine Learning Models of Code |
2018 |
|||||
Benchmarks for software clone detection: A ten-year retrospective |
SANER |
2018 |
||||
Binary Code Clone Detection across Architectures and Compiling Configurations |
ICPC |
2017 |
||||
Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection |
ACM CCS |
2017 |
||||
BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection |
ASIA CCS |
2017 |
||||
BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape |
DIMVA |
2017 |
||||
Compiler-agnostic function detection in binaries |
IEEE EuroS&P |
2017 |
||||
BinSign: Fingerprinting binary functions to support automated analysis of code executables |
2017 |
|||||
Similarity of binaries through re-optimization |
PLDI |
2017 |
||||
Transferring code-clone detection and analysis to practice |
ICSE-SEIP |
2017 |
||||
Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping |
IEEE S&P |
2017 |
||||
Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code |
IJCAI |
2017 |
||||
Extracting Conditional Formulas for Cross-Platform Bug Search |
ASIA CCS |
2017 |
||||
SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills |
ICSE |
2017 |
||||
CCLearner: A Deep Learning-Based Clone Detection Approach |
2017 |
|||||
BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking |
USENIX |
2017 |
||||
In-memory Fuzzing for Binary Code Similarity Analysis |
ASE |
2017 |
||||
DéjàVu: a map of code duplicates on GitHub |
OOPSLA |
2017 |
||||
Some from Here, Some from There: Cross-project Code Reuse in GitHub |
MSR |
2017 |
||||
CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph |
2017 |
|||||
Identifying Functionally Similar Code in Complex Codebases |
ICPC |
2016 |
||||
Scalable graph-based bug search for firmware images (Genius) |
ASM CCS |
2016 |
||||
Cross-Architecture Binary Semantics Understanding via Similar Code Comparison |
IEEE SANER |
2016 |
||||
discovRE: Efficient cross-architecture identification of bugs in binary code |
NDSS |
2016 |
||||
BinGo: Cross-architecture cross-OS Binary Search |
FSE |
2016 |
||||
Kam1n0: Mapreduce-based assembly clone search for reverse engineering |
KDD |
2016 |
||||
Statistical similarity of binaries |
PLDI |
2016 |
||||
Deep learning code fragments for code clone detection |
ASE |
2016 |
||||
A Survey of Software Clone Detection Techniques |
2016 |
|||||
SourcererCC: Scaling Code Clone Detection to Big Code |
ICSE |
2016 |
||||
Binary executable file similarity calculation using function matching |
2016 |
|||||
Matching Similar Functions in Different Versions of a Malware |
2016 |
|||||
BinDNN: Resilient Function Matching Using Deep Learning |
2016 |
|||||
VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis |
ACSAC |
2016 |
||||
BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench |
2016 |
|||||
Cross-architecture bug search in binary executables |
IEEE S&P |
2015 |
||||
Library functions identification in binary code by using graph isomorphism testings |
2015 |
|||||
Evaluating clone detection tools with BigCloneBench |
2015 |
|||||
Memoized semantics-based binary diffing with application to malware lineage inference |
2015 |
|||||
Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code |
2015 |
|||||
BYTEWEIGHT: Learning to Recognize Functions in Binary Code |
USENIX |
2014 |
||||
Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection |
FSE |
2014 |
||||
Binclone: Detecting code clones in malware |
SERE |
2014 |
||||
Detecting fine-grained similarity in binaries |
2014 |
|||||
Leveraging semantic signatures for bug search in binary programs |
ACSAC |
2014 |
||||
How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors |
2014 |
|||||
Tracelet-based code search in executables |
PLDI |
2014 |
||||
Control Flow-Based Malware Variant Detection |
2014 |
|||||
Hashing for Similarity Search: A Survey |
2014 |
|||||
Achieving accuracy and scalability simultaneously in detecting application clones on android markets |
ICSE |
2014 |
||||
Identifying Shared Software Components to Support Malware Forensics |
2014 |
|||||
Evaluating Modern Clone Detection Tools |
2014 |
|||||
Rendezvous: a search engine for binary code |
MSR |
2013 |
||||
Binslayer: accurate comparison of binary executables |
PPREW |
2013 |
||||
Software clone detection: A systematic review |
2013 |
|||||
How to extract differences from similar programs? A cohesion metric approach |
2013 |
|||||
Software clone detection and refactoring |
2013 |
|||||
An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code |
2013 |
|||||
A hybrid-token and textual based approach to find similar code segments |
2013 |
|||||
Gapped code clone detection with lightweight source code analysis |
2013 |
|||||
MutantX-S: Scalable Malware Clustering Based on Static Features |
USENIX |
2013 |
||||
Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice |
PPREW |
2013 |
||||
Towards Automatic Software Lineage Inference |
USENIX |
2013 |
||||
AnDarwin: Scalable Detection of Semantically Similar Android Applications |
2013 |
|||||
Expose: Discovering potential binary code re-use |
2013 |
|||||
Function Matching-based Binary level Software Similarity Calculation |
RACS |
2013 |
||||
FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors |
RAID |
2013 |
||||
A study of repetitiveness of code changes in software evolution |
ASE |
2013 |
||||
ibinhunt: Binary hunting with interprocedural control flow |
2012 |
|||||
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions |
USENIX |
2012 |
||||
Boreas: an accurate and scalable token-based approach to code clone detection |
ASE |
2012 |
||||
Folding Repeated Instructions for Improving Token-Based Code Clone Detection |
2012 |
|||||
A metrics-based data mining approach for software clone detection |
2012 |
|||||
Comparison of Clone Detection Techniques |
2012 |
|||||
Malware Classification Method via Binary Content Comparison |
RACS |
2012 |
||||
Binary function clustering using semantic hashes |
ICMLA |
2012 |
||||
Value-based program characterization and its application to software plagiarism detection |
2011 |
|||||
CMCD: Count Matrix Based Code Clone Detection |
2011 |
|||||
Incremental code clone detection: A pdg-based approach |
2011 |
|||||
Anywhere, Any-Time Binary Instrumentation |
2011 |
|||||
Code reuse in open source software development: Quantitative evidence, drivers, and impediments |
2010 |
|||||
Index-based code clone detection: incremental, distributed, scalable |
2010 |
|||||
Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics |
2010 |
|||||
Ghezzi, A hybrid approach (syntactic and textual) to clone detection |
2010 |
|||||
Evaluating code clone genealogies at release level: An empirical study |
2010 |
|||||
A survey of Binary similarity and distance measures |
2010 |
|||||
Idea: Opcode-Sequence-Based Malware Detection |
2010 |
|||||
Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces |
USENIX |
2010 |
||||
Data fingerprinting with similarity digests |
2010 |
|||||
Automatic mining of functionally equivalent code fragments via random testing |
2009 |
|||||
A mutation/injection-based automatic framework for evaluating code clone detection tools |
2009 |
|||||
Problematic code clones identification using multiple detection results |
2009 |
|||||
Incremental clone detection |
2009 |
|||||
Scalable and incremental clone detection for evolving software |
2009 |
|||||
Large-scale Malware Indexing Using Function-call Graphs |
2009 |
|||||
Scalable, Behavior-Based Malware Clustering |
2009 |
|||||
peHash: A Novel Approach to Fast Malware Clustering |
USENIX |
2009 |
||||
Detecting Code Clones in Binary Executables |
2009 |
|||||
Binhunt: Automatically finding semantic differences in binary programs |
2008 |
|||||
Scalable detection of semantic clones |
2008 |
|||||
Deckard: Scalable and accurate tree-based detection of code clones |
2007 |
|||||
Large-scale code reuse in open source software |
2007 |
|||||
A survey on software clone detection research |
2007 |
|||||
A study of consistent and inconsistent changes to code clones |
2007 |
|||||
Comparison and evaluation of clone detection tools |
2007 |
|||||
Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions |
2007 |
|||||
A Static Birthmark of Binary Executables Based on API Call Structure |
2007 |
|||||
CP-Miner: Finding copy-paste and related bugs in large-scale software code |
2006 |
|||||
Survey of research on software clones |
2006 |
|||||
“Cloning considered harmful” considered harmful: patterns of cloning in software |
2006 |
|||||
GPLAG: detection of software plagiarism by program dependence graph analysis |
2006 |
|||||
Detecting Self-mutating Malware Using Control-flow Graph Matching |
2006 |
|||||
Identifying Almost Identical Files Using Context Triggered Piecewise Hashing |
2006 |
|||||
Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience |
IEEE S&P |
2006 |
||||
Graph-based comparison of executable objects |
2005 |
|||||
SDD: high performance code clone detection system for large scale source code |
2005 |
|||||
Polygraph: Automatically generating signatures for polymorphic worms |
2005 |
|||||
K-gram Based Software Birthmarks |
2005 |
|||||
Insights into System-Wide Code Duplication |
IEEE |
2004 |
||||
Clone detection in source code by frequent itemset techniques |
2004 |
|||||
Evaluating clone detection techniques from a refactoring perspective |
2004 |
|||||
Structural comparison of executable objects |
2004 |
|||||
Code compaction of matching single-entry multiple-exit regions |
2003 |
|||||
CloSpan: Mining: Closed sequential patterns in large datasets |
2003 |
|||||
Ccfinder: a multilinguistic token-based code clone detection system for large scale source code |
2002 |
|||||
Identifying similar code with program dependence graphs |
2001 |
|||||
Using slicing to identify duplication in source code |
2001 |
|||||
BMAT – A Binary Matching Tool for Stale Profile Propagation |
2000 |
|||||
A language independent approach for detecting duplicated code |
1999 |
|||||
Compressing Differences of Executable Code |
1999 |
|||||
Similarity search in high dimensions via hashing |
1999 |
|||||
Clone detection using abstract syntax trees |
1998 |
|||||
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics |
1996 |
|||||
Pattern matching for clone and concept detection |
1996 |
|||||
On finding duplication and near-duplication in large software systems |
1995 |
|||||
Detecting code similarity using patterns |
1995 |
|||||
A Cross-platform Binary Diff |
1995 |