Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest

Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as...

Full description

Saved in:
Bibliographic Details
Main Authors: Osowomuabe Njama-Abang, Denis Ashishie, Paul Bukie
Format: Article
Language:English
Published: Nigerian Society of Physical Sciences 2025-08-01
Series:Journal of Nigerian Society of Physical Sciences
Subjects:
Online Access:https://journal.nsps.org.ng/index.php/jnsps/article/view/2586
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849431335256457216
author Osowomuabe Njama-Abang
Denis Ashishie
Paul Bukie
author_facet Osowomuabe Njama-Abang
Denis Ashishie
Paul Bukie
author_sort Osowomuabe Njama-Abang
collection DOAJ
description Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as age, symptom severity, and residence. SMOTE successfully balanced the dataset (minority class recall improved from 0.60 to 1.00 in Random Forest), mitigating the bias toward majority classes. Without SMOTE, models including Random Forest, XGBoost, and LightGBM achieved high accuracy (> 99%) but demonstrated poor minority recall (?0.75), confirming the challenge of imbalanced data. Post-SMOTE balancing, these models achieved 100% accuracy, precision, recall, and F1-scores across major classes. Notably, the hybrid ensemble model further enhanced outcomes, achieving an F1-score of 0.80 for the rarest class. These results underscore the superiority of SMOTE in improving classification for underrepresented outcomes compared to reliance on Random Forest alone, demonstrating its value in developing equitable predictive tools for outbreak management.
format Article
id doaj-art-fda5bae63fbd42f0b4bb1a217f3a8ea9
institution Kabale University
issn 2714-2817
2714-4704
language English
publishDate 2025-08-01
publisher Nigerian Society of Physical Sciences
record_format Article
series Journal of Nigerian Society of Physical Sciences
spelling doaj-art-fda5bae63fbd42f0b4bb1a217f3a8ea92025-08-20T03:27:40ZengNigerian Society of Physical SciencesJournal of Nigerian Society of Physical Sciences2714-28172714-47042025-08-017310.46481/jnsps.2025.2586Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forestOsowomuabe Njama-AbangDenis AshishiePaul Bukie Class imbalance in epidemiological datasets, particularly for rare outcomes like Lassa Fever fatalities, complicates predictive modeling. This study addresses the issue by employing SMOTE to rebalance the dataset and Random Forest for classification while identifying significant predictors such as age, symptom severity, and residence. SMOTE successfully balanced the dataset (minority class recall improved from 0.60 to 1.00 in Random Forest), mitigating the bias toward majority classes. Without SMOTE, models including Random Forest, XGBoost, and LightGBM achieved high accuracy (> 99%) but demonstrated poor minority recall (?0.75), confirming the challenge of imbalanced data. Post-SMOTE balancing, these models achieved 100% accuracy, precision, recall, and F1-scores across major classes. Notably, the hybrid ensemble model further enhanced outcomes, achieving an F1-score of 0.80 for the rarest class. These results underscore the superiority of SMOTE in improving classification for underrepresented outcomes compared to reliance on Random Forest alone, demonstrating its value in developing equitable predictive tools for outbreak management. https://journal.nsps.org.ng/index.php/jnsps/article/view/2586Lassa feverMachine learningSMOTERandom forestClass imbalance
spellingShingle Osowomuabe Njama-Abang
Denis Ashishie
Paul Bukie
Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
Journal of Nigerian Society of Physical Sciences
Lassa fever
Machine learning
SMOTE
Random forest
Class imbalance
title Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
title_full Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
title_fullStr Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
title_full_unstemmed Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
title_short Addressing class imbalance in lassa fever epidemic data, using machine learning: a case study with SMOTE and random forest
title_sort addressing class imbalance in lassa fever epidemic data using machine learning a case study with smote and random forest
topic Lassa fever
Machine learning
SMOTE
Random forest
Class imbalance
url https://journal.nsps.org.ng/index.php/jnsps/article/view/2586
work_keys_str_mv AT osowomuabenjamaabang addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest
AT denisashishie addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest
AT paulbukie addressingclassimbalanceinlassafeverepidemicdatausingmachinelearningacasestudywithsmoteandrandomforest