Self-Supervised Learning Meets Custom Autoencoder Classifier: A Semi-Supervised Approach for Encrypted Traffic Anomaly Detection

The widespread adoption of encryption in computer networks has made detecting malicious traffic, especially at network perimeters, increasingly challenging. As packet contents are concealed, traditional monitoring techniques such as Deep Packet Inspection (DPI) become ineffective. Consequently, rese...

Full description

Saved in:
Bibliographic Details
Main Authors: A. Ramzi Bahlali, Abdelmalik Bachir, Abdeldjalil Labed
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11113262/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The widespread adoption of encryption in computer networks has made detecting malicious traffic, especially at network perimeters, increasingly challenging. As packet contents are concealed, traditional monitoring techniques such as Deep Packet Inspection (DPI) become ineffective. Consequently, researchers have started employing data-driven methods based on Machine and Deep Learning (ML & DL) to identify malicious behavior even from encrypted traffic, typically within Anomaly-based Network Intrusion Detection Systems (A-NIDS). Existing approaches rely heavily on supervised learning, which requires large volumes of labeled benign and malicious traffic. Generating these labels is time-consuming, error-prone, and often requires expert knowledge. In this paper, we propose a semi-supervised learning framework that leverages Self-Supervised Learning (SSL) to learn discriminative representations from unlabeled network traffic. We design a novel pretext task that predicts important masked features, enabling the model to capture meaningful structure in the data. The learned representations are fine-tuned with minimal labeled data using a Custom-Autoencoder (Custom-AE) classifier. Experimental results show that the representation learned from our proposed pretext task outperforms the best competing method in terms of accuracy by 3.41% on UNSW-NB15 (NB15) and 1.53% on CSE-CIC-IDS2018 (CSE18) when evaluated using linear probing. When fine-tuned with the Custom-AE on only 100 benign and 10 malicious samples, it achieves 83.51% (NB15) and 87.43% (CSE18) accuracy, representing gains of 4.55% and 5.08% over the initial features, respectively. This demonstrates stronger suitability for label-scarce real-world scenarios compared to existing approaches.
ISSN:2169-3536