A novel oversampling method based on Wasserstein CGAN for imbalanced classification

Abstract Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongfang Zhou, Heng Pan, Kangyun Zheng, Zongling Wu, Qingyu Xiang
Format: Article
Language:English
Published: SpringerOpen 2025-02-01
Series:Cybersecurity
Subjects:
Online Access:https://doi.org/10.1186/s42400-024-00290-0
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on GANs has proliferated. These techniques have proven to be excellent in addressing the class imbalance issue by capturing the distributional features of minority samples during training and generating high-quality new samples. However, oversampling methods based on GANs may suffer from gradient vanishing, resulting in mode collapse, and produce noise and boundary-blurring issues when generating new samples. This paper proposes a novel oversampling method based on a conditional GAN (CGAN) incorporating Wasserstein distance. It generates an initial balanced dataset from minority class samples using the CGAN oversampling approach and then uses a noise and boundary recognition method based on K-means and $$k$$ k nearest neighbors algorithm to address the noise and boundary-blurring issues. The proposed method generates new samples that are highly consistent with the original sample distribution and effectively solves the problems of noise data and class boundary blurring. Experimental results on multiple public datasets show that the proposed method achieves significant improvements in evaluation metrics such as Recall, F1_score, G-mean, and AUC.
ISSN:2523-3246