Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection

Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system th...

Full description

Saved in:
Bibliographic Details
Main Authors: Yongsik Kim, Su-Youn Hong, Sungjin Park, Huy Kang Kim
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10848062/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583971611869184
author Yongsik Kim
Su-Youn Hong
Sungjin Park
Huy Kang Kim
author_facet Yongsik Kim
Su-Youn Hong
Sungjin Park
Huy Kang Kim
author_sort Yongsik Kim
collection DOAJ
description Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system that accurately detects attacks on each user’s system. Since the attack does not occur only at a specific point in the network, there is a limitation in identifying computer intrusion simply using network packets. A Host-based Intrusion Detection System (HIDS) is a highly effective tool for monitoring computer systems and detecting unusual or unauthorized activities. HIDS can quickly identify potential security threats by closely monitoring and analyzing system logs, configurations, file integrity, and events specific to a host machine. It helps maintain the security and integrity of individual systems by detecting unauthorized activities or policy violations. With its advanced capabilities and reliable performance, HIDS is essential to any comprehensive host-based security strategy. Although HIDS can detect insider intrusions, the known HIDS detection methods are limited to specific attacks and may be ineffective against new attack patterns. Recently, researchers applied Natural Language Processing (NLP) in HIDS to scrutinize complex attack patterns, but they could have more effectively provided useful outputs for detecting intrusions based on these patterns. In this paper, we use reinforcement learning methodology, Actor-Critic, and NLP to extract keywords that occur on each anomaly system call log and propose a rule generation framework to prevent future intrusion detection using the extracted words. We analyze the anomaly log using NLP and extract the characteristics of each attack log as the ‘keyword.’ Based on the unique keywords of each attack log, we utilize reinforcement learning to establish a set of rules to protect against attacks. We extracted keywords based on textrank from the system call log sequence and simultaneously provided ground truth data using the extracted keywords. Based on the extracted keywords, the pre-trained Seq2Seq model generate rules according to the reward calculation method in reinforcement learning. When calculating the reward in reinforcement learning, we used the comparison value with the pre-trained Seq2Seq model, the malware log sequence detected by the rule set based on reinforcement learning, and the false positive value generated by the normal data to create its own rule set. We verified the proposed framework using the system call log datasets: ADFA-LD, LID-DS 2021 dataset. The proposed framework demonstrated a high accuracy rate of 96.5% average when faced with different attacks. We compared the accuracy based on the proposed framework detection, textrank, and Seq2Seq model-based keyword extraction methods. As a result, the proposed framework showed relatively high accuracy against various attack logs.
format Article
id doaj-art-44e03b1f623b4e61b9cad51d6c22d505
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-44e03b1f623b4e61b9cad51d6c22d5052025-01-28T00:01:44ZengIEEEIEEE Access2169-35362025-01-0113153461536210.1109/ACCESS.2025.353235310848062Reinforcement Learning-Based Generative Security Framework for Host Intrusion DetectionYongsik Kim0https://orcid.org/0009-0005-1558-1180Su-Youn Hong1Sungjin Park2Huy Kang Kim3https://orcid.org/0000-0002-0760-8807School of Cybersecurity, Korea University, Seoul, Republic of KoreaLIG Nex1, Yongin-si, South KoreaLIG Nex1, Yongin-si, South KoreaSchool of Cybersecurity, Korea University, Seoul, Republic of KoreaProtecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system that accurately detects attacks on each user’s system. Since the attack does not occur only at a specific point in the network, there is a limitation in identifying computer intrusion simply using network packets. A Host-based Intrusion Detection System (HIDS) is a highly effective tool for monitoring computer systems and detecting unusual or unauthorized activities. HIDS can quickly identify potential security threats by closely monitoring and analyzing system logs, configurations, file integrity, and events specific to a host machine. It helps maintain the security and integrity of individual systems by detecting unauthorized activities or policy violations. With its advanced capabilities and reliable performance, HIDS is essential to any comprehensive host-based security strategy. Although HIDS can detect insider intrusions, the known HIDS detection methods are limited to specific attacks and may be ineffective against new attack patterns. Recently, researchers applied Natural Language Processing (NLP) in HIDS to scrutinize complex attack patterns, but they could have more effectively provided useful outputs for detecting intrusions based on these patterns. In this paper, we use reinforcement learning methodology, Actor-Critic, and NLP to extract keywords that occur on each anomaly system call log and propose a rule generation framework to prevent future intrusion detection using the extracted words. We analyze the anomaly log using NLP and extract the characteristics of each attack log as the ‘keyword.’ Based on the unique keywords of each attack log, we utilize reinforcement learning to establish a set of rules to protect against attacks. We extracted keywords based on textrank from the system call log sequence and simultaneously provided ground truth data using the extracted keywords. Based on the extracted keywords, the pre-trained Seq2Seq model generate rules according to the reward calculation method in reinforcement learning. When calculating the reward in reinforcement learning, we used the comparison value with the pre-trained Seq2Seq model, the malware log sequence detected by the rule set based on reinforcement learning, and the false positive value generated by the normal data to create its own rule set. We verified the proposed framework using the system call log datasets: ADFA-LD, LID-DS 2021 dataset. The proposed framework demonstrated a high accuracy rate of 96.5% average when faced with different attacks. We compared the accuracy based on the proposed framework detection, textrank, and Seq2Seq model-based keyword extraction methods. As a result, the proposed framework showed relatively high accuracy against various attack logs.https://ieeexplore.ieee.org/document/10848062/Reinforcement learningnatural language processinghost-based intrusion detection system
spellingShingle Yongsik Kim
Su-Youn Hong
Sungjin Park
Huy Kang Kim
Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
IEEE Access
Reinforcement learning
natural language processing
host-based intrusion detection system
title Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_full Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_fullStr Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_full_unstemmed Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_short Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
title_sort reinforcement learning based generative security framework for host intrusion detection
topic Reinforcement learning
natural language processing
host-based intrusion detection system
url https://ieeexplore.ieee.org/document/10848062/
work_keys_str_mv AT yongsikkim reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection
AT suyounhong reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection
AT sungjinpark reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection
AT huykangkim reinforcementlearningbasedgenerativesecurityframeworkforhostintrusiondetection