Reinforcement Learning-Based Generative Security Framework for Host Intrusion Detection
Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system th...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10848062/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Protecting users’ systems from evolving cybercrime is becoming increasingly challenging. Attackers create more complicated attack patterns and configure attack behavior to resemble normal behavior to evade detection by defenders. Thus, it is indispensable to configure a security system that accurately detects attacks on each user’s system. Since the attack does not occur only at a specific point in the network, there is a limitation in identifying computer intrusion simply using network packets. A Host-based Intrusion Detection System (HIDS) is a highly effective tool for monitoring computer systems and detecting unusual or unauthorized activities. HIDS can quickly identify potential security threats by closely monitoring and analyzing system logs, configurations, file integrity, and events specific to a host machine. It helps maintain the security and integrity of individual systems by detecting unauthorized activities or policy violations. With its advanced capabilities and reliable performance, HIDS is essential to any comprehensive host-based security strategy. Although HIDS can detect insider intrusions, the known HIDS detection methods are limited to specific attacks and may be ineffective against new attack patterns. Recently, researchers applied Natural Language Processing (NLP) in HIDS to scrutinize complex attack patterns, but they could have more effectively provided useful outputs for detecting intrusions based on these patterns. In this paper, we use reinforcement learning methodology, Actor-Critic, and NLP to extract keywords that occur on each anomaly system call log and propose a rule generation framework to prevent future intrusion detection using the extracted words. We analyze the anomaly log using NLP and extract the characteristics of each attack log as the ‘keyword.’ Based on the unique keywords of each attack log, we utilize reinforcement learning to establish a set of rules to protect against attacks. We extracted keywords based on textrank from the system call log sequence and simultaneously provided ground truth data using the extracted keywords. Based on the extracted keywords, the pre-trained Seq2Seq model generate rules according to the reward calculation method in reinforcement learning. When calculating the reward in reinforcement learning, we used the comparison value with the pre-trained Seq2Seq model, the malware log sequence detected by the rule set based on reinforcement learning, and the false positive value generated by the normal data to create its own rule set. We verified the proposed framework using the system call log datasets: ADFA-LD, LID-DS 2021 dataset. The proposed framework demonstrated a high accuracy rate of 96.5% average when faced with different attacks. We compared the accuracy based on the proposed framework detection, textrank, and Seq2Seq model-based keyword extraction methods. As a result, the proposed framework showed relatively high accuracy against various attack logs. |
---|---|
ISSN: | 2169-3536 |