MSSA: multi-stage semantic-aware neural network for binary code similarity detection
Binary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computatio...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-01-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2504.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594288981049344 |
---|---|
author | Bangrui Wan Jianjun Zhou Ying Wang Feng Chen Ying Qian |
author_facet | Bangrui Wan Jianjun Zhou Ying Wang Feng Chen Ying Qian |
author_sort | Bangrui Wan |
collection | DOAJ |
description | Binary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computation resources. Learning-based approaches remains room for optimization in learning the deeper semantics of binary code. In this paper, we propose MSSA, a multi-stage semantic-aware neural network for BCSD at the function level. It effectively integrates the semantic and structural information of assembly instructions within and between basic blocks, and across the entire function through four semantic-aware neural networks, achieving deep understanding of binary code semantics. MSSA is a lightweight model with only 0.38M parameters in its backbone network, suitable for deployment in CPU environments. Experimental results show that MSSA outperforms Gemini, Asm2Vec, SAFE, and jTrans in classification performance and ranks second only to the Transformer-based jTrans in retrieval performance. |
format | Article |
id | doaj-art-715cba494c79410eb2654a6a638a2ab3 |
institution | Kabale University |
issn | 2376-5992 |
language | English |
publishDate | 2025-01-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj-art-715cba494c79410eb2654a6a638a2ab32025-01-19T15:05:10ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e250410.7717/peerj-cs.2504MSSA: multi-stage semantic-aware neural network for binary code similarity detectionBangrui Wan0Jianjun Zhou1Ying Wang2Feng Chen3Ying Qian4School of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, ChinaSchool of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, ChinaSchool of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, ChinaSchool of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, ChinaSchool of Software Engineering, Chongqing University of Posts and Telecommunications, Chongqing, ChinaBinary code similarity detection (BCSD) aims to identify whether a pair of binary code snippets is similar, which is widely used for tasks such as malware analysis, patch analysis, and clone detection. Current state-of-the-art approaches are based on Transformer, which require substantial computation resources. Learning-based approaches remains room for optimization in learning the deeper semantics of binary code. In this paper, we propose MSSA, a multi-stage semantic-aware neural network for BCSD at the function level. It effectively integrates the semantic and structural information of assembly instructions within and between basic blocks, and across the entire function through four semantic-aware neural networks, achieving deep understanding of binary code semantics. MSSA is a lightweight model with only 0.38M parameters in its backbone network, suitable for deployment in CPU environments. Experimental results show that MSSA outperforms Gemini, Asm2Vec, SAFE, and jTrans in classification performance and ranks second only to the Transformer-based jTrans in retrieval performance.https://peerj.com/articles/cs-2504.pdfBinary analysisSimilarity detectionNeural network |
spellingShingle | Bangrui Wan Jianjun Zhou Ying Wang Feng Chen Ying Qian MSSA: multi-stage semantic-aware neural network for binary code similarity detection PeerJ Computer Science Binary analysis Similarity detection Neural network |
title | MSSA: multi-stage semantic-aware neural network for binary code similarity detection |
title_full | MSSA: multi-stage semantic-aware neural network for binary code similarity detection |
title_fullStr | MSSA: multi-stage semantic-aware neural network for binary code similarity detection |
title_full_unstemmed | MSSA: multi-stage semantic-aware neural network for binary code similarity detection |
title_short | MSSA: multi-stage semantic-aware neural network for binary code similarity detection |
title_sort | mssa multi stage semantic aware neural network for binary code similarity detection |
topic | Binary analysis Similarity detection Neural network |
url | https://peerj.com/articles/cs-2504.pdf |
work_keys_str_mv | AT bangruiwan mssamultistagesemanticawareneuralnetworkforbinarycodesimilaritydetection AT jianjunzhou mssamultistagesemanticawareneuralnetworkforbinarycodesimilaritydetection AT yingwang mssamultistagesemanticawareneuralnetworkforbinarycodesimilaritydetection AT fengchen mssamultistagesemanticawareneuralnetworkforbinarycodesimilaritydetection AT yingqian mssamultistagesemanticawareneuralnetworkforbinarycodesimilaritydetection |