Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification

Abstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and und...

Full description

Saved in:
Bibliographic Details
Main Authors: Amirreza Salehi, Majid Khedmati
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-024-84786-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571804592373760
author Amirreza Salehi
Majid Khedmati
author_facet Amirreza Salehi
Majid Khedmati
author_sort Amirreza Salehi
collection DOAJ
description Abstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.
format Article
id doaj-art-941a28c5066b45aba20ad505fa5971f4
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-941a28c5066b45aba20ad505fa5971f42025-02-02T12:18:56ZengNature PortfolioScientific Reports2045-23222025-01-0115112010.1038/s41598-024-84786-2Hybrid clustering strategies for effective oversampling and undersampling in multiclass classificationAmirreza Salehi0Majid Khedmati1Department of Industrial Engineering, Sharif University of TechnologyDepartment of Industrial Engineering, Sharif University of TechnologyAbstract Multiclass imbalance is a challenging problem in real-world datasets, where certain classes may have a low number of samples because they correspond to rare occurrences. To address the challenge of multiclass imbalance, this paper introduces a novel hybrid cluster-based oversampling and undersampling (HCBOU) technique. By clustering and separating classes into majority and minority categories, this algorithm retains the most information during undersampling while generating efficient data in the minority class. The classification is carried out using one-vs-one and one-vs-all decomposition schemes. Extensive experimentation was carried out on 30 datasets to evaluate the proposed algorithm's performance. The results were subsequently compared with those of several state-of-the-art algorithms. Based on the results, the proposed algorithm outperforms the competing algorithms under different scenarios. Finally, The HCBOU algorithm demonstrated robust performance across varying class imbalance levels, highlighting its effectiveness in handling imbalanced datasets.https://doi.org/10.1038/s41598-024-84786-2Multiclass classificationImbalanced dataOversamplingUndersamplingEnsemble
spellingShingle Amirreza Salehi
Majid Khedmati
Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
Scientific Reports
Multiclass classification
Imbalanced data
Oversampling
Undersampling
Ensemble
title Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
title_full Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
title_fullStr Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
title_full_unstemmed Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
title_short Hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
title_sort hybrid clustering strategies for effective oversampling and undersampling in multiclass classification
topic Multiclass classification
Imbalanced data
Oversampling
Undersampling
Ensemble
url https://doi.org/10.1038/s41598-024-84786-2
work_keys_str_mv AT amirrezasalehi hybridclusteringstrategiesforeffectiveoversamplingandundersamplinginmulticlassclassification
AT majidkhedmati hybridclusteringstrategiesforeffectiveoversamplingandundersamplinginmulticlassclassification