An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing

Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—inclu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wantong Yang, Enze Wang, Zhiwen Gui, Yuan Zhou, Baosheng Wang, Wei Xie
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	multi-modal large language models web crawlers web application fuzzing
Online Access:	https://www.mdpi.com/2076-3417/15/2/962
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589219167469568
author	Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie
author_facet	Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie
author_sort	Wantong Yang
collection	DOAJ
description	Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.
format	Article
id	doaj-art-cde51dda4a9a40bead56f3cb797bd99f
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-cde51dda4a9a40bead56f3cb797bd99f2025-01-24T13:21:30ZengMDPI AGApplied Sciences2076-34172025-01-0115296210.3390/app15020962An MLLM-Assisted Web Crawler Approach for Web Application FuzzingWantong Yang0Enze Wang1Zhiwen Gui2Yuan Zhou3Baosheng Wang4Wei Xie5College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaWeb application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.https://www.mdpi.com/2076-3417/15/2/962multi-modal large language modelsweb crawlersweb application fuzzing
spellingShingle	Wantong Yang Enze Wang Zhiwen Gui Yuan Zhou Baosheng Wang Wei Xie An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing Applied Sciences multi-modal large language models web crawlers web application fuzzing
title	An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_full	An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_fullStr	An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_full_unstemmed	An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_short	An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_sort	mllm assisted web crawler approach for web application fuzzing
topic	multi-modal large language models web crawlers web application fuzzing
url	https://www.mdpi.com/2076-3417/15/2/962
work_keys_str_mv	AT wantongyang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT enzewang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT zhiwengui anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT yuanzhou anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT baoshengwang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT weixie anmllmassistedwebcrawlerapproachforwebapplicationfuzzing AT wantongyang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT enzewang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT zhiwengui mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT yuanzhou mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT baoshengwang mllmassistedwebcrawlerapproachforwebapplicationfuzzing AT weixie mllmassistedwebcrawlerapproachforwebapplicationfuzzing

An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing

Similar Items