Program semantic analysis model for code reuse detection

Program similarity analysis had a wide range of applications in areas such as code plagiarism and property protection, but it generally suffered from problems such as excessive computational overhead, a code similarity analysis method based on fuzzy matching and statistical inference was proposed. F...

Full description

Saved in:
Bibliographic Details
Main Authors: GUO Xi, WANG Pan
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2024-12-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024269/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595480699207680
author GUO Xi
WANG Pan
author_facet GUO Xi
WANG Pan
author_sort GUO Xi
collection DOAJ
description Program similarity analysis had a wide range of applications in areas such as code plagiarism and property protection, but it generally suffered from problems such as excessive computational overhead, a code similarity analysis method based on fuzzy matching and statistical inference was proposed. For binary programs, first disassembly analysis was performed and then function boundary recognition operations was performed to extract the execution boundary information of the function. On this basis, dynamic programming analysis methods were used to obtain similarity results between basic blocks at the granularity of the basic blocks, and neighborhood search was performed on the basis of the control flow graph to extend similarity analysis from the basic block level to the function level. Finally, the semantic similarity of binary files was obtained through statistical analysis of similarity functions. During this process, the pre trained model was optimized and analyzed, and the parameters were tuned to enable similarity analysis of cross platform code. The experimental results show that the proposed method has a significant improvement in analysis accuracy compared to traditional analysis tools, with an average increase of 7.1% in analysis accuracy compared to current mainstream analysis tools.
format Article
id doaj-art-874549224ede4749b9bd7443d27fc965
institution Kabale University
issn 1000-436X
language zho
publishDate 2024-12-01
publisher Editorial Department of Journal on Communications
record_format Article
series Tongxin xuebao
spelling doaj-art-874549224ede4749b9bd7443d27fc9652025-01-18T19:00:09ZzhoEditorial Department of Journal on CommunicationsTongxin xuebao1000-436X2024-12-014517919680268935Program semantic analysis model for code reuse detectionGUO XiWANG PanProgram similarity analysis had a wide range of applications in areas such as code plagiarism and property protection, but it generally suffered from problems such as excessive computational overhead, a code similarity analysis method based on fuzzy matching and statistical inference was proposed. For binary programs, first disassembly analysis was performed and then function boundary recognition operations was performed to extract the execution boundary information of the function. On this basis, dynamic programming analysis methods were used to obtain similarity results between basic blocks at the granularity of the basic blocks, and neighborhood search was performed on the basis of the control flow graph to extend similarity analysis from the basic block level to the function level. Finally, the semantic similarity of binary files was obtained through statistical analysis of similarity functions. During this process, the pre trained model was optimized and analyzed, and the parameters were tuned to enable similarity analysis of cross platform code. The experimental results show that the proposed method has a significant improvement in analysis accuracy compared to traditional analysis tools, with an average increase of 7.1% in analysis accuracy compared to current mainstream analysis tools.http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024269/program analysisfuzzy matchingstatistical inferencemachine learning
spellingShingle GUO Xi
WANG Pan
Program semantic analysis model for code reuse detection
Tongxin xuebao
program analysis
fuzzy matching
statistical inference
machine learning
title Program semantic analysis model for code reuse detection
title_full Program semantic analysis model for code reuse detection
title_fullStr Program semantic analysis model for code reuse detection
title_full_unstemmed Program semantic analysis model for code reuse detection
title_short Program semantic analysis model for code reuse detection
title_sort program semantic analysis model for code reuse detection
topic program analysis
fuzzy matching
statistical inference
machine learning
url http://www.joconline.com.cn/zh/article/doi/10.11959/j.issn.1000-436x.2024269/
work_keys_str_mv AT guoxi programsemanticanalysismodelforcodereusedetection
AT wangpan programsemanticanalysismodelforcodereusedetection