Imbalanced Data Problem in Machine Learning: A Review

One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to...

Full description

Saved in:
Bibliographic Details
Main Authors: Manahel Altalhan, Abdulmohsen Algarni, Monia Turki-Hadj Alouane
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10845793/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:One of the prominent challenges encountered in real-world data is an imbalance, characterized by unequal distribution of observations across different target classes, which complicates achieving accurate model classifications. This survey delves into various machine learning techniques developed to address the difficulties posed by imbalanced data. It discusses data-level methods such as oversampling and undersampling, algorithm-level solutions including ensemble learning and specific algorithm adjustments, cost-sensitive algorithms, and hybrid strategies that combine multiple approaches. Moreover, this paper emphasizes the crucial role of evaluation methods like Precision, F1 Score, Recall, G-mean, and AUC in measuring the effectiveness of these strategies under imbalanced conditions. A detailed review of recent research articles helps pinpoint persistent gaps in generalizability, scalability, and robustness across these methods, underscoring the necessity for ongoing improvements. The survey seeks to offer an extensive overview of current approaches that improve the efficiency and effectiveness of machine learning models dealing with imbalanced datasets, thus equipping researchers with the insights needed to develop robust and effective models ready for real-world application.
ISSN:2169-3536