An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing

Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—inclu...

Full description

Saved in:
Bibliographic Details
Main Authors: Wantong Yang, Enze Wang, Zhiwen Gui, Yuan Zhou, Baosheng Wang, Wei Xie
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/962
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589219167469568
author Wantong Yang
Enze Wang
Zhiwen Gui
Yuan Zhou
Baosheng Wang
Wei Xie
author_facet Wantong Yang
Enze Wang
Zhiwen Gui
Yuan Zhou
Baosheng Wang
Wei Xie
author_sort Wantong Yang
collection DOAJ
description Web application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.
format Article
id doaj-art-cde51dda4a9a40bead56f3cb797bd99f
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-cde51dda4a9a40bead56f3cb797bd99f2025-01-24T13:21:30ZengMDPI AGApplied Sciences2076-34172025-01-0115296210.3390/app15020962An MLLM-Assisted Web Crawler Approach for Web Application FuzzingWantong Yang0Enze Wang1Zhiwen Gui2Yuan Zhou3Baosheng Wang4Wei Xie5College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaWeb application fuzzing faces significant challenges in achieving comprehensive test interface (attack surface) coverage, primarily due to the complexity of user interactions and dynamic website architectures. While web crawlers can automatically access and extract critical website information—including form fields and request parameters—which are essential for generating effective fuzzing test cases, current crawler technologies exhibit three primary limitations: (i) insufficient capabilities in analyzing page relationships and determining page states; (ii) lack of functionality-aware exploration capabilities, resulting in generated inputs with poor contextual relevance; (iii) generation of unstructured operation sequences that fail to execute effectively due to their incompatibility with state-based testing logic. To address these challenges, we propose CrawlMLLM, a framework using multi-modal large language models to simulate human web browsing. It includes three core components: page state mining, functionality analysis, and automatic operation generation. Evaluations show 163% code coverage improvements over SOTA work. When integrated with vulnerability audit tools, CrawlMLLM found 44 vulnerabilities in three vulnerable web applications versus 34 by the baseline. In six real-world applications, CrawlMLLM detected 20 vulnerabilities while the next best method found six.https://www.mdpi.com/2076-3417/15/2/962multi-modal large language modelsweb crawlersweb application fuzzing
spellingShingle Wantong Yang
Enze Wang
Zhiwen Gui
Yuan Zhou
Baosheng Wang
Wei Xie
An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
Applied Sciences
multi-modal large language models
web crawlers
web application fuzzing
title An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_full An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_fullStr An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_full_unstemmed An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_short An MLLM-Assisted Web Crawler Approach for Web Application Fuzzing
title_sort mllm assisted web crawler approach for web application fuzzing
topic multi-modal large language models
web crawlers
web application fuzzing
url https://www.mdpi.com/2076-3417/15/2/962
work_keys_str_mv AT wantongyang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT enzewang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT zhiwengui anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT yuanzhou anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT baoshengwang anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT weixie anmllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT wantongyang mllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT enzewang mllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT zhiwengui mllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT yuanzhou mllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT baoshengwang mllmassistedwebcrawlerapproachforwebapplicationfuzzing
AT weixie mllmassistedwebcrawlerapproachforwebapplicationfuzzing