Angus: efficient active learning strategies for provenance based intrusion detection

Abstract As modern attack methods become more concealed and complex, obtaining many labeled samples in big data streams is difficult. Active learning has long been used to achieve better intrusion detection performance by using only a small number of training samples. Intrusion behaviors can be desc...

Full description

Saved in:
Bibliographic Details
Main Authors: Lin Wu, Yulai Xie, Jin Li, Dan Feng, Jinyuan Liang, Yafeng Wu
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-024-00311-y
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract As modern attack methods become more concealed and complex, obtaining many labeled samples in big data streams is difficult. Active learning has long been used to achieve better intrusion detection performance by using only a small number of training samples. Intrusion behaviors can be described by provenance graphs that record the dependency relationships between intrusion processes and the infected files. It is a challenge to develop active learning strategies that consider defining and selecting the most valuable provenance and ensure that the strategy for querying provenance is efficient. We present Angus, an active learning framework for provenance-based intrusion detection. We propose two novel active learning strategies: the most similar graph query strategy and the maximum difference query strategy. They either select samples to update the training set according to similarities of provenance graphs or preferentially select samples with low redundancy and large differences from the current training set. Besides, we also improve the above query strategies by using the parallel query to reduce detection time overheads. The experiments on various real-world applications demonstrate their performance and efficiency.
ISSN:2523-3246