An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—inclu...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/15/2/962 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589219167469568 |
---|---|
author | Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie |
author_facet | Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie |
author_sort | Wantong Yang |
collection | DOAJ |
description | Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six. |
format | Article |
id | doaj-art-cde51dda4a9a40bead56f3cb797bd99f |
institution | Kabale University |
issn | 2076-3417 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj-art-cde51dda4a9a40bead56f3cb797bd99f2025-01-24T13:21:30ZengMDPI AGApplied Sciences2076-34172025-01-0115296210.3390/app15020962An MLLM-Assisted Web Crawler Approach for Web Application FuzzingWantong Yang0Enze Wang1Zhiwen Gui2Yuan Zhou3Baosheng Wang4Wei Xie5College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaWeb application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.https://www.mdpi.com/2076-3417/15/2/962multi-modal large language modelsweb crawlersweb application fuzzing |
spellingShingle | Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing Applied Sciences multi-modal large language models web crawlers web application fuzzing |
title | An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing |
title_full | An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing |
title_fullStr | An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing |
title_full_unstemmed | An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing |
title_short | An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing |
title_sort | mllm assisted web crawler approach for web application fuzzing |
topic | multi-modal large language models web crawlers web application fuzzing |
url | https://www.mdpi.com/2076-3417/15/2/962 |
work_keys_str_mv | AT wantongyang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT enzewang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT zhiwengui anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT yuanzhou anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT baoshengwang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT weixie anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT wantongyang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT enzewang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT zhiwengui mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT yuanzhou mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT baoshengwang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT weixie mllmassistedwebcrawlerapproachforwebapplicationfuzzing |