LLM-Based Unknown Function Automated Modeling in Sensor-Driven Systems for Multi-Language Software Security Verification

The rapid expansion of the Internet of Things (IoT) has made software security and reliability a critical concern. With multi-language programs running on edge computing, embedded systems, and sensors, each connected device represents a potential attack vector, threatening data integrity and privacy...

Full description

Saved in:
Bibliographic Details
Main Authors: Liangjun Deng, Qi Zhong, Jingcheng Song, Hang Lei, Wenjuan Li
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/9/2683
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The rapid expansion of the Internet of Things (IoT) has made software security and reliability a critical concern. With multi-language programs running on edge computing, embedded systems, and sensors, each connected device represents a potential attack vector, threatening data integrity and privacy. Symbolic execution is a key technique for automated vulnerability detection. However, unknown function interfaces, such as sensor interactions, limit traditional concrete or concolic execution due to uncertain function returns and missing symbolic expressions. Compared with system simulation, the traditional method is to construct an interface abstraction layer for the symbolic execution engine to reduce the cost of simulation. Nevertheless, the disadvantage of this solution is that the manual modeling of these functions is very inefficient and requires professional developers to spend hundreds of hours. In order to improve efficiency, we propose an LLM-based automated approach for modeling unknown functions. By fine-tuning a 20-billion-parameter language model, it automatically generates function models based on annotations and function names. Our method improves symbolic execution efficiency, reducing reliance on manual modeling, which is a limitation of existing frameworks like KLEE. Experimental results primarily focus on comparing the usability, accuracy, and efficiency of LLM-generated models with human-written ones. Our approach was integrated into one verification platform project and applied to the verification of smart contracts with distributed edge computing characteristics. The application of this method directly reduces the manual modeling effort from a month to just a few minutes. This provides a foundational validation of our method’s feasibility, particularly in reducing modeling time while maintaining quality. This work is the first to integrate LLMs into formal verification, offering a scalable and automated verification solution for sensor-driven software, blockchain smart contracts, and WebAssembly systems, expanding the scope of secure IoT development.
ISSN:1424-8220