Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing

Abstract Background Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in t...

Full description

Saved in:
Bibliographic Details
Main Authors: Enas Al-khlifeh, Ahmad S. Tarawneh, Khalid Almohammadi, Malek Alrashidi, Ramadan Hassanat, Ahmad B. Hassanat
Format: Article
Language:English
Published: BMC 2025-01-01
Series:Parasites & Vectors
Subjects:
Online Access:https://doi.org/10.1186/s13071-024-06618-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571909691146240
author Enas Al-khlifeh
Ahmad S. Tarawneh
Khalid Almohammadi
Malek Alrashidi
Ramadan Hassanat
Ahmad B. Hassanat
author_facet Enas Al-khlifeh
Ahmad S. Tarawneh
Khalid Almohammadi
Malek Alrashidi
Ramadan Hassanat
Ahmad B. Hassanat
author_sort Enas Al-khlifeh
collection DOAJ
description Abstract Background Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis. Method Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions. Results The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers. Conclusions The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis. Graphical Abstract
format Article
id doaj-art-0f31f6a8a9fc41f3ba26afc3e58540df
institution Kabale University
issn 1756-3305
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series Parasites & Vectors
spelling doaj-art-0f31f6a8a9fc41f3ba26afc3e58540df2025-02-02T12:11:05ZengBMCParasites & Vectors1756-33052025-01-0118111810.1186/s13071-024-06618-6Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testingEnas Al-khlifeh0Ahmad S. Tarawneh1Khalid Almohammadi2Malek Alrashidi3Ramadan Hassanat4Ahmad B. Hassanat5Department of Applied Biology, Al-Balqa Applied UniversityFaculty of Information Technology, Mutah UniversityComputer Science Department, Applied College, University of TabukComputer Science Department, Applied College, University of TabukGeneral Surgery Department, Jordanian Royal medical serviceFaculty of Information Technology, Mutah UniversityAbstract Background Amebiasis represents a significant global health concern. This is especially evident in developing countries, where infections are more common. The primary diagnostic method in laboratories involves the microscopy of stool samples. However, this approach can sometimes result in the misinterpretation of amebiasis as other gastroenteritis (GE) conditions. The goal of the work is to produce a machine learning (ML) model that uses laboratory findings and demographic information to automatically predict amebiasis. Method Data extracted from Jordanian electronic medical records (EMR) between 2020 and 2022 comprised 763 amebic cases and 314 nonamebic cases. Patient demographics, clinical signs, microscopic diagnoses, and leukocyte counts were used to train eight decision tree algorithms and compare their accuracy of predictions. Feature ranking and correlation methods were implemented to enhance the accuracy of classifying amebiasis from other conditions. Results The primary dependent variables distinguishing amebiasis include the percentage of neutrophils, mucus presence, and the counts of red blood cells (RBCs) and white blood cells (WBCs) in stool samples. Prediction accuracy and precision ranged from 92% to 94.6% when employing decision tree classifiers including decision tree (DT), random forest (RF), XGBoost, AdaBoost, and gradient boosting (GB). However, the optimized RF model demonstrated an area under the curve (AUC) of 98% for detecting amebiasis from laboratory data, utilizing only 300 estimators with a max depth of 20. This study highlights that amebiasis is a significant health concern in Jordan, responsible for 17.22% of all gastroenteritis episodes in this study. Male sex and age were associated with higher incidence of amebiasis (P = 0.014), with over 25% of cases occurring in infants and toddlers. Conclusions The application of ML to EMR can accurately predict amebiasis. This finding significantly contributes to the emerging use of ML as a decision support system in parasitic disease diagnosis. Graphical Abstracthttps://doi.org/10.1186/s13071-024-06618-6AmebiasisE. histolyticaElectronic medical records (EMR)Microscopic diagnosisMachine learningFeature selection
spellingShingle Enas Al-khlifeh
Ahmad S. Tarawneh
Khalid Almohammadi
Malek Alrashidi
Ramadan Hassanat
Ahmad B. Hassanat
Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
Parasites & Vectors
Amebiasis
E. histolytica
Electronic medical records (EMR)
Microscopic diagnosis
Machine learning
Feature selection
title Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
title_full Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
title_fullStr Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
title_full_unstemmed Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
title_short Decision tree-based learning and laboratory data mining: an efficient approach to amebiasis testing
title_sort decision tree based learning and laboratory data mining an efficient approach to amebiasis testing
topic Amebiasis
E. histolytica
Electronic medical records (EMR)
Microscopic diagnosis
Machine learning
Feature selection
url https://doi.org/10.1186/s13071-024-06618-6
work_keys_str_mv AT enasalkhlifeh decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting
AT ahmadstarawneh decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting
AT khalidalmohammadi decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting
AT malekalrashidi decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting
AT ramadanhassanat decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting
AT ahmadbhassanat decisiontreebasedlearningandlaboratorydatamininganefficientapproachtoamebiasistesting