SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models
Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenar...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4893/18/1/37 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Class imbalance is a prevalent challenge in machine learning that arises from skewed data distributions in one class over another, causing models to prioritize the majority class and underperform on the minority classes. This bias can significantly undermine accurate predictions in real-world scenarios, highlighting the importance of the robust handling of imbalanced data for dependable results. This study examines one such scenario of real-time monitoring systems for fall risk assessment in bedridden patients where class imbalance may compromise the effectiveness of machine learning. It compares the effectiveness of two resampling techniques, the Synthetic Minority Oversampling Technique (SMOTE) and SMOTE combined with Edited Nearest Neighbors (SMOTEENN), in mitigating class imbalance and improving predictive performance. Using a controlled sampling strategy across various instance levels, the performance of both methods in conjunction with decision tree regression, gradient boosting regression, and Bayesian regression models was evaluated. The results indicate that SMOTEENN consistently outperforms SMOTE in terms of accuracy and mean squared error across all sample sizes and models. SMOTEENN also demonstrates healthier learning curves, suggesting improved generalization capabilities, particularly for a sampling strategy with a given number of instances. Furthermore, cross-validation analysis reveals that SMOTEENN achieves higher mean accuracy and lower standard deviation compared to SMOTE, indicating more stable and reliable performance. These findings suggest that SMOTEENN is a more effective technique for handling class imbalance, potentially contributing to the development of more accurate and generalizable predictive models in various applications. |
---|---|
ISSN: | 1999-4893 |