Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
Abstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and und...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-024-84786-2 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571804592373760 |
---|---|
author | Amirreza Salehi Majid Khedmati |
author_facet | Amirreza Salehi Majid Khedmati |
author_sort | Amirreza Salehi |
collection | DOAJ |
description | Abstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets. |
format | Article |
id | doaj-art-941a28c5066b45aba20ad505fa5971f4 |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-941a28c5066b45aba20ad505fa5971f42025-02-02T12:18:56ZengNature PortfolioScientific Reports2045-23222025-01-0115112010.1038/s41598-024-84786-2Hybrid clustering strategies for effective oversampling and undersampling in multiclass classificationAmirreza Salehi0Majid Khedmati1Department of Industrial Engineering, Sharif University of TechnologyDepartment of Industrial Engineering, Sharif University of TechnologyAbstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.https://doi.org/10.1038/s41598-024-84786-2Multiclass classificationImbalanced dataOversamplingUndersamplingEnsemble |
spellingShingle | Amirreza Salehi Majid Khedmati Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification Scientific Reports Multiclass classification Imbalanced data Oversampling Undersampling Ensemble |
title | Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
title_full | Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
title_fullStr | Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
title_full_unstemmed | Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
title_short | Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
title_sort | hybrid clustering strategies for effective oversampling and undersampling in multiclass classification |
topic | Multiclass classification Imbalanced data Oversampling Undersampling Ensemble |
url | https://doi.org/10.1038/s41598-024-84786-2 |
work_keys_str_mv | AT amirrezasalehi hybridclusteringstrategiesforeffectiveoversamplingandundersamplinginmulticlassclassification AT majidkhedmati hybridclusteringstrategiesforeffectiveoversamplingandundersamplinginmulticlassclassification |