Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Nigerian Society of Physical Sciences
2025-08-01
|
| Series: | Journal of Nigerian Society of Physical Sciences |
| Subjects: | |
| Online Access: | https://journal.nsps.org.ng/index.php/jnsps/article/view/2586 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849431335256457216 |
|---|---|
| author | Osowomuabe Njama-Abang Denis Ashishie Paul Bukie |
| author_facet | Osowomuabe Njama-Abang Denis Ashishie Paul Bukie |
| author_sort | Osowomuabe Njama-Abang |
| collection | DOAJ |
| description |
Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as age, symptom severity, and residence. SMOTE successfully balanced the dataset (minority class recall improved from 0.60 to 1.00 in Random Forest), mitigating the bias toward majority classes. Without SMOTE, models including Random Forest, XGBoost, and LightGBM achieved high accuracy (> 99%) but demonstrated poor minority recall (?0.75), confirming the challenge of imbalanced data. Post-SMOTE balancing, these models achieved 100% accuracy, precision, recall, and F1-scores across major classes. Notably, the hybrid ensemble model further enhanced outcomes, achieving an F1-score of 0.80 for the rarest class. These results underscore the superiority of SMOTE in improving classification for underrepresented outcomes compared to reliance on Random Forest alone, demonstrating its value in developing equitable predictive tools for outbreak management.
|
| format | Article |
| id | doaj-art-fda5bae63fbd42f0b4bb1a217f3a8ea9 |
| institution | Kabale University |
| issn | 2714-2817 2714-4704 |
| language | English |
| publishDate | 2025-08-01 |
| publisher | Nigerian Society of Physical Sciences |
| record_format | Article |
| series | Journal of Nigerian Society of Physical Sciences |
| spelling | doaj-art-fda5bae63fbd42f0b4bb1a217f3a8ea92025-08-20T03:27:40ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042025-08-017310.46481/jnsps.2025.2586Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forestOsowomuabe Njama-AbangDenis AshishiePaul Bukie Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as age, symptom severity, and residence. SMOTE successfully balanced the dataset (minority class recall improved from 0.60 to 1.00 in Random Forest), mitigating the bias toward majority classes. Without SMOTE, models including Random Forest, XGBoost, and LightGBM achieved high accuracy (> 99%) but demonstrated poor minority recall (?0.75), confirming the challenge of imbalanced data. Post-SMOTE balancing, these models achieved 100% accuracy, precision, recall, and F1-scores across major classes. Notably, the hybrid ensemble model further enhanced outcomes, achieving an F1-score of 0.80 for the rarest class. These results underscore the superiority of SMOTE in improving classification for underrepresented outcomes compared to reliance on Random Forest alone, demonstrating its value in developing equitable predictive tools for outbreak management. https://journal.nsps.org.ng/index.php/jnsps/article/view/2586Lassa feverMachine learningSMOTERandom forestClass imbalance |
| spellingShingle | Osowomuabe Njama-Abang Denis Ashishie Paul Bukie Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest Journal of Nigerian Society of Physical Sciences Lassa fever Machine learning SMOTE Random forest Class imbalance |
| title | Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest |
| title_full | Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest |
| title_fullStr | Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest |
| title_full_unstemmed | Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest |
| title_short | Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest |
| title_sort | addressing class imbalance in lassa fever epidemic data using machine learning a case study with smote and random forest |
| topic | Lassa fever Machine learning SMOTE Random forest Class imbalance |
| url | https://journal.nsps.org.ng/index.php/jnsps/article/view/2586 |
| work_keys_str_mv | AT osowomuabenjamaabang addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest AT denisashishie addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest AT paulbukie addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest |