EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications

Abstract In recent years, penetration testing (pen‐testing) has emerged as a crucial process for evaluating the security level of network infrastructures by simulating real‐world cyber‐attacks. Automating pen‐testing through reinforcement learning (RL) facilitates more frequent assessments, minimize...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zegang Li, Qian Zhang, Guangwen Yang
Format:	Article
Language:	English
Published:	Wiley 2025-01-01
Series:	Engineering Reports
Subjects:	asynchronous RL optimizations partial observable penetration testing
Online Access:	https://doi.org/10.1002/eng2.12818
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832576597290385408
author	Zegang Li Qian Zhang Guangwen Yang
author_facet	Zegang Li Qian Zhang Guangwen Yang
author_sort	Zegang Li
collection	DOAJ
description	Abstract In recent years, penetration testing (pen‐testing) has emerged as a crucial process for evaluating the security level of network infrastructures by simulating real‐world cyber‐attacks. Automating pen‐testing through reinforcement learning (RL) facilitates more frequent assessments, minimizes human effort, and enhances scalability. However, real‐world pen‐testing tasks often involve incomplete knowledge of the target network system. Effectively managing the intrinsic uncertainties via partially observable Markov decision processes (POMDPs) constitutes a persistent challenge within the realm of pen‐testing. Furthermore, RL agents are compelled to formulate intricate strategies to contend with the challenges posed by partially observable environments, thereby engendering augmented computational and temporal expenditures. To address these issues, this study introduces EPPTA (efficient POMDP‐driven penetration testing agent), an agent built on an asynchronous RL framework, designed for conducting pen‐testing tasks within partially observable environments. We incorporate an implicit belief module in EPPTA, grounded on the belief update formula of the traditional POMDP model, which represents the agent's probabilistic estimation of the current environment state. Furthermore, by integrating the algorithm with the high‐performance RL framework, sample factory, EPPTA significantly reduces convergence time compared to existing pen‐testing methods, resulting in an approximately 20‐fold acceleration. Empirical results across various pen‐testing scenarios validate EPPTA's superior task reward performance and enhanced scalability, providing substantial support for efficient and advanced evaluation of network infrastructure security.
format	Article
id	doaj-art-1080b7856c6740c3b37af4770043d279
institution	Kabale University
issn	2577-8196
language	English
publishDate	2025-01-01
publisher	Wiley
record_format	Article
series	Engineering Reports
spelling	doaj-art-1080b7856c6740c3b37af4770043d2792025-01-31T00:22:48ZengWileyEngineering Reports2577-81962025-01-0171n/an/a10.1002/eng2.12818EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applicationsZegang Li0Qian Zhang1Guangwen Yang2Department of Computer Science and Technology Tsinghua University Beijing ChinaNational Supercomputing Center in Wuxi Wuxi ChinaDepartment of Computer Science and Technology Tsinghua University Beijing ChinaAbstract In recent years, penetration testing (pen‐testing) has emerged as a crucial process for evaluating the security level of network infrastructures by simulating real‐world cyber‐attacks. Automating pen‐testing through reinforcement learning (RL) facilitates more frequent assessments, minimizes human effort, and enhances scalability. However, real‐world pen‐testing tasks often involve incomplete knowledge of the target network system. Effectively managing the intrinsic uncertainties via partially observable Markov decision processes (POMDPs) constitutes a persistent challenge within the realm of pen‐testing. Furthermore, RL agents are compelled to formulate intricate strategies to contend with the challenges posed by partially observable environments, thereby engendering augmented computational and temporal expenditures. To address these issues, this study introduces EPPTA (efficient POMDP‐driven penetration testing agent), an agent built on an asynchronous RL framework, designed for conducting pen‐testing tasks within partially observable environments. We incorporate an implicit belief module in EPPTA, grounded on the belief update formula of the traditional POMDP model, which represents the agent's probabilistic estimation of the current environment state. Furthermore, by integrating the algorithm with the high‐performance RL framework, sample factory, EPPTA significantly reduces convergence time compared to existing pen‐testing methods, resulting in an approximately 20‐fold acceleration. Empirical results across various pen‐testing scenarios validate EPPTA's superior task reward performance and enhanced scalability, providing substantial support for efficient and advanced evaluation of network infrastructure security.https://doi.org/10.1002/eng2.12818asynchronous RLoptimizationspartial observablepenetration testing
spellingShingle	Zegang Li Qian Zhang Guangwen Yang EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications Engineering Reports asynchronous RL optimizations partial observable penetration testing
title	EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
title_full	EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
title_fullStr	EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
title_full_unstemmed	EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
title_short	EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications
title_sort	eppta efficient partially observable reinforcement learning agent for penetration testing applications
topic	asynchronous RL optimizations partial observable penetration testing
url	https://doi.org/10.1002/eng2.12818
work_keys_str_mv	AT zegangli epptaefficientpartiallyobservablereinforcementlearningagentforpenetrationtestingapplications AT qianzhang epptaefficientpartiallyobservablereinforcementlearningagentforpenetrationtestingapplications AT guangwenyang epptaefficientpartiallyobservablereinforcementlearningagentforpenetrationtestingapplications

EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications

Similar Items