Efficient anomaly detection in tabular cybersecurity data using large language models

Abstract In cybersecurity, anomaly detection in tabular data is essential for ensuring information security. While traditional machine learning and deep learning methods have shown some success, they continue to face significant challenges in terms of generalization. To address these limitations, th...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaoyong Zhao, Xingxin Leng, Lei Wang, Ningning Wang, Yanqiong Liu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88050-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract In cybersecurity, anomaly detection in tabular data is essential for ensuring information security. While traditional machine learning and deep learning methods have shown some success, they continue to face significant challenges in terms of generalization. To address these limitations, this paper presents an innovative method for tabular data anomaly detection based on large language models, called “Tabular Anomaly Detection via Guided Prompts” (TAD-GP). This approach utilizes a 7-billion-parameter open-source model and incorporates strategies such as data sample introduction, anomaly type recognition, chain-of-thought reasoning, multi-turn dialogue, and key information reinforcement. Experimental results indicate that the TAD-GP framework improves F1 scores by 79.31%, 97.96%, and 59.09% on the CICIDS2017, KDD Cup 1999, and UNSW-NB15 datasets, respectively. Furthermore, the smaller-scale TAD-GP model outperforms larger models across multiple datasets, demonstrating its practical potential in environments with constrained computational resources and requirements for private deployment. This method addresses a critical gap in research on anomaly detection in cybersecurity, specifically using small-scale open-source models.
ISSN:2045-2322