Safe Semi-Supervised Contrastive Learning Using In-Distribution Data as Positive Examples
Semi-supervised learning (SSL) methods have shown promising results in solving many practical problems when only a few labels are available. The existing methods assume that the class distributions of labeled and unlabeled data are equal; however, their performances are significantly degraded in cla...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11016683/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Semi-supervised learning (SSL) methods have shown promising results in solving many practical problems when only a few labels are available. The existing methods assume that the class distributions of labeled and unlabeled data are equal; however, their performances are significantly degraded in class distribution mismatch scenarios where out-of-distribution (OOD) data exist in the unlabeled data. Previous safe SSL studies have addressed this problem by making OOD data less likely to affect training based on labeled data. However, even if the studies effectively filter out the unnecessary OOD data, they can lose the basic information that all data share regardless of class. To this end, we propose to apply a self-supervised contrastive learning (SSCL) approach to fully exploit a large amount of unlabeled data. We also propose a contrastive loss function with a coefficient schedule to aggregate as an anchor the labeled negative examples of the same class into positive examples. To evaluate the performance of the proposed method, we conduct experiments on image classification datasets-CIFAR-10, CIFAR-100, Tiny ImageNet, and CIFAR-100+Tiny ImageNet—under various mismatch ratios. The results show that SSCL significantly improves classification accuracy, and our proposed loss function further enhances the performance, collectively outperforming existing methods by <inline-formula> <tex-math notation="LaTeX">$2\sim 9$ </tex-math></inline-formula>% across various benchmark datasets. The performance gains become more pronounced as dataset complexity increases and remain robust even in challenging cross-dataset scenarios. |
|---|---|
| ISSN: | 2169-3536 |