Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection

Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yongsik Kim, Su-Youn Hong, Sungjin Park, Huy Kang Kim
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Reinforcement learning natural language processing host-based intrusion detection system
Online Access:	https://ieeexplore.ieee.org/document/10848062/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832583971611869184
author	Yongsik Kim Su-Youn Hong Sungjin Park Huy Kang Kim
author_facet	Yongsik Kim Su-Youn Hong Sungjin Park Huy Kang Kim
author_sort	Yongsik Kim
collection	DOAJ
description	Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system that accurately detects attacks on each user’s system. Since the attack does not occur only at a specific point in the network, there is a limitation in identifying computer intrusion simply using network packets. A Host-based Intrusion Detection System (HIDS) is a highly effective tool for monitoring computer systems and detecting unusual or unauthorized activities. HIDS can quickly identify potential security threats by closely monitoring and analyzing system logs, configurations, file integrity, and events specific to a host machine. It helps maintain the security and integrity of individual systems by detecting unauthorized activities or policy violations. With its advanced capabilities and reliable performance, HIDS is essential to any comprehensive host-based security strategy. Although HIDS can detect insider intrusions, the known HIDS detection methods are limited to specific attacks and may be ineffective against new attack patterns. Recently, researchers applied Natural Language Processing (NLP) in HIDS to scrutinize complex attack patterns, but they could have more effectively provided useful outputs for detecting intrusions based on these patterns. In this paper, we use reinforcement learning methodology, Actor-Critic, and NLP to extract keywords that occur on each anomaly system call log and propose a rule generation framework to prevent future intrusion detection using the extracted words. We analyze the anomaly log using NLP and extract the characteristics of each attack log as the ‘keyword.’ Based on the unique keywords of each attack log, we utilize reinforcement learning to establish a set of rules to protect against attacks. We extracted keywords based on textrank from the system call log sequence and simultaneously provided ground truth data using the extracted keywords. Based on the extracted keywords, the pre-trained Seq2Seq model generate rules according to the reward calculation method in reinforcement learning. When calculating the reward in reinforcement learning, we used the comparison value with the pre-trained Seq2Seq model, the malware log sequence detected by the rule set based on reinforcement learning, and the false positive value generated by the normal data to create its own rule set. We verified the proposed framework using the system call log datasets: ADFA-LD, LID-DS 2021 dataset. The proposed framework demonstrated a high accuracy rate of 96.5% average when faced with different attacks. We compared the accuracy based on the proposed framework detection, textrank, and Seq2Seq model-based keyword extraction methods. As a result, the proposed framework showed relatively high accuracy against various attack logs.
format	Article
id	doaj-art-44e03b1f623b4e61b9cad51d6c22d505
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-44e03b1f623b4e61b9cad51d6c22d5052025-01-28T00:01:44ZengIEEEIEEE Access2169-35362025-01-0113153461536210.1109/ACCESS.2025.353235310848062Reinforcement Learning-Based Generative Security Framework for Host Intrusion DetectionYongsik Kim0https://orcid.org/0009-0005-1558-1180Su-Youn Hong1Sungjin Park2Huy Kang Kim3https://orcid.org/0000-0002-0760-8807School of Cybersecurity, Korea University, Seoul, Republic of KoreaLIG Nex1, Yongin-si, South KoreaLIG Nex1, Yongin-si, South KoreaSchool of Cybersecurity, Korea University, Seoul, Republic of KoreaProtecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system that accurately detects attacks on each user’s system. Since the attack does not occur only at a specific point in the network, there is a limitation in identifying computer intrusion simply using network packets. A Host-based Intrusion Detection System (HIDS) is a highly effective tool for monitoring computer systems and detecting unusual or unauthorized activities. HIDS can quickly identify potential security threats by closely monitoring and analyzing system logs, configurations, file integrity, and events specific to a host machine. It helps maintain the security and integrity of individual systems by detecting unauthorized activities or policy violations. With its advanced capabilities and reliable performance, HIDS is essential to any comprehensive host-based security strategy. Although HIDS can detect insider intrusions, the known HIDS detection methods are limited to specific attacks and may be ineffective against new attack patterns. Recently, researchers applied Natural Language Processing (NLP) in HIDS to scrutinize complex attack patterns, but they could have more effectively provided useful outputs for detecting intrusions based on these patterns. In this paper, we use reinforcement learning methodology, Actor-Critic, and NLP to extract keywords that occur on each anomaly system call log and propose a rule generation framework to prevent future intrusion detection using the extracted words. We analyze the anomaly log using NLP and extract the characteristics of each attack log as the ‘keyword.’ Based on the unique keywords of each attack log, we utilize reinforcement learning to establish a set of rules to protect against attacks. We extracted keywords based on textrank from the system call log sequence and simultaneously provided ground truth data using the extracted keywords. Based on the extracted keywords, the pre-trained Seq2Seq model generate rules according to the reward calculation method in reinforcement learning. When calculating the reward in reinforcement learning, we used the comparison value with the pre-trained Seq2Seq model, the malware log sequence detected by the rule set based on reinforcement learning, and the false positive value generated by the normal data to create its own rule set. We verified the proposed framework using the system call log datasets: ADFA-LD, LID-DS 2021 dataset. The proposed framework demonstrated a high accuracy rate of 96.5% average when faced with different attacks. We compared the accuracy based on the proposed framework detection, textrank, and Seq2Seq model-based keyword extraction methods. As a result, the proposed framework showed relatively high accuracy against various attack logs.https://ieeexplore.ieee.org/document/10848062/Reinforcement learningnatural language processinghost-based intrusion detection system
spellingShingle	Yongsik Kim Su-Youn Hong Sungjin Park Huy Kang Kim Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection IEEE Access Reinforcement learning natural language processing host-based intrusion detection system
title	Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_full	Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_fullStr	Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_full_unstemmed	Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_short	Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_sort	reinforcement learning based generative security framework for host intrusion detection
topic	Reinforcement learning natural language processing host-based intrusion detection system
url	https://ieeexplore.ieee.org/document/10848062/
work_keys_str_mv	AT yongsikkim reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection AT suyounhong reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection AT sungjinpark reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection AT huykangkim reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection

Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection

Similar Items