Angus: efficient active learning strategies for provenance based intrusion detection

Abstract As modern attack methods become more concealed and complex, obtaining many labeled samples in big data streams is difficult. Active learning has long been used to achieve better intrusion detection performance by using only a small number of training samples. Intrusion behaviors can be desc...

Full description

Saved in:
Bibliographic Details
Main Authors: Lin Wu, Yulai Xie, Jin Li, Dan Feng, Jinyuan Liang, Yafeng Wu
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-024-00311-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571577546309632
author Lin Wu
Yulai Xie
Jin Li
Dan Feng
Jinyuan Liang
Yafeng Wu
author_facet Lin Wu
Yulai Xie
Jin Li
Dan Feng
Jinyuan Liang
Yafeng Wu
author_sort Lin Wu
collection DOAJ
description Abstract As modern attack methods become more concealed and complex, obtaining many labeled samples in big data streams is difficult. Active learning has long been used to achieve better intrusion detection performance by using only a small number of training samples. Intrusion behaviors can be described by provenance graphs that record the dependency relationships between intrusion processes and the infected files. It is a challenge to develop active learning strategies that consider defining and selecting the most valuable provenance and ensure that the strategy for querying provenance is efficient. We present Angus, an active learning framework for provenance-based intrusion detection. We propose two novel active learning strategies: the most similar graph query strategy and the maximum difference query strategy. They either select samples to update the training set according to similarities of provenance graphs or preferentially select samples with low redundancy and large differences from the current training set. Besides, we also improve the above query strategies by using the parallel query to reduce detection time overheads. The experiments on various real-world applications demonstrate their performance and efficiency.
format Article
id doaj-art-4301928d28ed4ecd9744b08b5c68894e
institution Kabale University
issn 2523-3246
language English
publishDate 2025-01-01
publisher SpringerOpen
record_format Article
series Cybersecurity
spelling doaj-art-4301928d28ed4ecd9744b08b5c68894e2025-02-02T12:30:05ZengSpringerOpenCybersecurity2523-32462025-01-018111710.1186/s42400-024-00311-yAngus: efficient active learning strategies for provenance based intrusion detectionLin Wu0Yulai Xie1Jin Li2Dan Feng3Jinyuan Liang4Yafeng Wu5Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and TechnologyHubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and TechnologyHubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and TechnologySchool of Science and Technology, Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage, Huazhong University of Science and TechnologyUniversity of British Columbia Vancouver British ColumbiaHubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and TechnologyAbstract As modern attack methods become more concealed and complex, obtaining many labeled samples in big data streams is difficult. Active learning has long been used to achieve better intrusion detection performance by using only a small number of training samples. Intrusion behaviors can be described by provenance graphs that record the dependency relationships between intrusion processes and the infected files. It is a challenge to develop active learning strategies that consider defining and selecting the most valuable provenance and ensure that the strategy for querying provenance is efficient. We present Angus, an active learning framework for provenance-based intrusion detection. We propose two novel active learning strategies: the most similar graph query strategy and the maximum difference query strategy. They either select samples to update the training set according to similarities of provenance graphs or preferentially select samples with low redundancy and large differences from the current training set. Besides, we also improve the above query strategies by using the parallel query to reduce detection time overheads. The experiments on various real-world applications demonstrate their performance and efficiency.https://doi.org/10.1186/s42400-024-00311-yProvenanceIntrusion detectionActive learningThe most similar graph query strategyThe maximum difference query strategy
spellingShingle Lin Wu
Yulai Xie
Jin Li
Dan Feng
Jinyuan Liang
Yafeng Wu
Angus: efficient active learning strategies for provenance based intrusion detection
Cybersecurity
Provenance
Intrusion detection
Active learning
The most similar graph query strategy
The maximum difference query strategy
title Angus: efficient active learning strategies for provenance based intrusion detection
title_full Angus: efficient active learning strategies for provenance based intrusion detection
title_fullStr Angus: efficient active learning strategies for provenance based intrusion detection
title_full_unstemmed Angus: efficient active learning strategies for provenance based intrusion detection
title_short Angus: efficient active learning strategies for provenance based intrusion detection
title_sort angus efficient active learning strategies for provenance based intrusion detection
topic Provenance
Intrusion detection
Active learning
The most similar graph query strategy
The maximum difference query strategy
url https://doi.org/10.1186/s42400-024-00311-y
work_keys_str_mv AT linwu angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection
AT yulaixie angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection
AT jinli angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection
AT danfeng angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection
AT jinyuanliang angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection
AT yafengwu angusefficientactivelearningstrategiesforprovenancebasedintrusiondetection