Self-Supervised Learning Meets Custom Autoencoder Classifier: A Semi-Supervised Approach for Encrypted Traffic Anomaly Detection
The widespread adoption of encryption in computer networks has made detecting malicious traffic, especially at network perimeters, increasingly challenging. As packet contents are concealed, traditional monitoring techniques such as Deep Packet Inspection (DPI) become ineffective. Consequently, rese...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11113262/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The widespread adoption of encryption in computer networks has made detecting malicious traffic, especially at network perimeters, increasingly challenging. As packet contents are concealed, traditional monitoring techniques such as Deep Packet Inspection (DPI) become ineffective. Consequently, researchers have started employing data-driven methods based on Machine and Deep Learning (ML & DL) to identify malicious behavior even from encrypted traffic, typically within Anomaly-based Network Intrusion Detection Systems (A-NIDS). Existing approaches rely heavily on supervised learning, which requires large volumes of labeled benign and malicious traffic. Generating these labels is time-consuming, error-prone, and often requires expert knowledge. In this paper, we propose a semi-supervised learning framework that leverages Self-Supervised Learning (SSL) to learn discriminative representations from unlabeled network traffic. We design a novel pretext task that predicts important masked features, enabling the model to capture meaningful structure in the data. The learned representations are fine-tuned with minimal labeled data using a Custom-Autoencoder (Custom-AE) classifier. Experimental results show that the representation learned from our proposed pretext task outperforms the best competing method in terms of accuracy by 3.41% on UNSW-NB15 (NB15) and 1.53% on CSE-CIC-IDS2018 (CSE18) when evaluated using linear probing. When fine-tuned with the Custom-AE on only 100 benign and 10 malicious samples, it achieves 83.51% (NB15) and 87.43% (CSE18) accuracy, representing gains of 4.55% and 5.08% over the initial features, respectively. This demonstrates stronger suitability for label-scarce real-world scenarios compared to existing approaches. |
|---|---|
| ISSN: | 2169-3536 |