Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
Background: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods:...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | Global Epidemiology |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S259011332500001X |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586790556401664 |
---|---|
author | Daniel M. Mwanga Isaac C. Kipchirchir George O. Muhua Charles R. Newton Damazo T. Kadengye Abankwah Junior Albert Akpalu Arjune Sen Bruno Mmbando Charles R. Newton Cynthia Sottie Dan Bhwana Daniel Mtai Mwanga Damazo T. Kadengye Daniel Nana Yaw David McDaid Dorcas Muli Emmanuel Darkwa Frederick Murunga Wekesah Gershim Asiki Gergana Manolova Guillaume Pages Helen Cross Henrika Kimambo Isolide S. Massawe Josemir W. Sander Mary Bitta Mercy Atieno Neerja Chowdhary Patrick Adjei Peter O. Otieno Ryan Wagner Richard Walker Sabina Asiamah Samuel Iddi Simone Grassi Sloan Mahone Sonia Vallentin Stella Waruingi Symon Kariuki Tarun Dua Thomas Kwasa Timothy Denison Tony Godi Vivian Mushi William Matuja |
author_facet | Daniel M. Mwanga Isaac C. Kipchirchir George O. Muhua Charles R. Newton Damazo T. Kadengye Abankwah Junior Albert Akpalu Arjune Sen Bruno Mmbando Charles R. Newton Cynthia Sottie Dan Bhwana Daniel Mtai Mwanga Damazo T. Kadengye Daniel Nana Yaw David McDaid Dorcas Muli Emmanuel Darkwa Frederick Murunga Wekesah Gershim Asiki Gergana Manolova Guillaume Pages Helen Cross Henrika Kimambo Isolide S. Massawe Josemir W. Sander Mary Bitta Mercy Atieno Neerja Chowdhary Patrick Adjei Peter O. Otieno Ryan Wagner Richard Walker Sabina Asiamah Samuel Iddi Simone Grassi Sloan Mahone Sonia Vallentin Stella Waruingi Symon Kariuki Tarun Dua Thomas Kwasa Timothy Denison Tony Godi Vivian Mushi William Matuja |
author_sort | Daniel M. Mwanga |
collection | DOAJ |
description | Background: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods: All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. Results: Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. Conclusion: These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies. |
format | Article |
id | doaj-art-5fa00d27c4f840b39f7abdb8b3a3e986 |
institution | Kabale University |
issn | 2590-1133 |
language | English |
publishDate | 2025-06-01 |
publisher | Elsevier |
record_format | Article |
series | Global Epidemiology |
spelling | doaj-art-5fa00d27c4f840b39f7abdb8b3a3e9862025-01-25T04:11:25ZengElsevierGlobal Epidemiology2590-11332025-06-019100183Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learningDaniel M. Mwanga0Isaac C. Kipchirchir1George O. Muhua2Charles R. Newton3Damazo T. Kadengye4Abankwah JuniorAlbert AkpaluArjune SenBruno MmbandoCharles R. NewtonCynthia SottieDan BhwanaDaniel Mtai MwangaDamazo T. KadengyeDaniel Nana YawDavid McDaidDorcas MuliEmmanuel DarkwaFrederick Murunga WekesahGershim AsikiGergana ManolovaGuillaume PagesHelen CrossHenrika KimamboIsolide S. MassaweJosemir W. SanderMary BittaMercy AtienoNeerja ChowdharyPatrick AdjeiPeter O. OtienoRyan WagnerRichard WalkerSabina AsiamahSamuel IddiSimone GrassiSloan MahoneSonia VallentinStella WaruingiSymon KariukiTarun DuaThomas KwasaTimothy DenisonTony GodiVivian MushiWilliam MatujaDepartment of Mathematics, University of Nairobi, Kenya; African Population and Health Research Center, Nairobi, Kenya; Corresponding author at: P.O. Box 10787-00100, Kitisuru, Nairobi, Kenya.Department of Mathematics, University of Nairobi, KenyaDepartment of Mathematics, University of Nairobi, KenyaDepartment of Psychiatry, University of Oxford, United Kingdom; Kenya Medical Research Institute, Wellcome Trust Research Programme, Kilifi, KenyaAfrican Population and Health Research Center, Nairobi, KenyaBackground: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods: All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. Results: Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. Conclusion: These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies.http://www.sciencedirect.com/science/article/pii/S259011332500001XMachine learningAttritionLoss to follow-upUrban settlementsEpilepsyPrevalence |
spellingShingle | Daniel M. Mwanga Isaac C. Kipchirchir George O. Muhua Charles R. Newton Damazo T. Kadengye Abankwah Junior Albert Akpalu Arjune Sen Bruno Mmbando Charles R. Newton Cynthia Sottie Dan Bhwana Daniel Mtai Mwanga Damazo T. Kadengye Daniel Nana Yaw David McDaid Dorcas Muli Emmanuel Darkwa Frederick Murunga Wekesah Gershim Asiki Gergana Manolova Guillaume Pages Helen Cross Henrika Kimambo Isolide S. Massawe Josemir W. Sander Mary Bitta Mercy Atieno Neerja Chowdhary Patrick Adjei Peter O. Otieno Ryan Wagner Richard Walker Sabina Asiamah Samuel Iddi Simone Grassi Sloan Mahone Sonia Vallentin Stella Waruingi Symon Kariuki Tarun Dua Thomas Kwasa Timothy Denison Tony Godi Vivian Mushi William Matuja Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning Global Epidemiology Machine learning Attrition Loss to follow-up Urban settlements Epilepsy Prevalence |
title | Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning |
title_full | Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning |
title_fullStr | Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning |
title_full_unstemmed | Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning |
title_short | Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning |
title_sort | modeling the determinants of attrition in a two stage epilepsy prevalence survey in nairobi using machine learning |
topic | Machine learning Attrition Loss to follow-up Urban settlements Epilepsy Prevalence |
url | http://www.sciencedirect.com/science/article/pii/S259011332500001X |
work_keys_str_mv | AT danielmmwanga modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT isaacckipchirchir modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT georgeomuhua modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT charlesrnewton modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT damazotkadengye modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT abankwahjunior modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT albertakpalu modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT arjunesen modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT brunommbando modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT charlesrnewton modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT cynthiasottie modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT danbhwana modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT danielmtaimwanga modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT damazotkadengye modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT danielnanayaw modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT davidmcdaid modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT dorcasmuli modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT emmanueldarkwa modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT frederickmurungawekesah modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT gershimasiki modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT gerganamanolova modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT guillaumepages modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT helencross modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT henrikakimambo modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT isolidesmassawe modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT josemirwsander modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT marybitta modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT mercyatieno modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT neerjachowdhary modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT patrickadjei modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT peterootieno modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT ryanwagner modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT richardwalker modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT sabinaasiamah modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT samueliddi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT simonegrassi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT sloanmahone modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT soniavallentin modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT stellawaruingi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT symonkariuki modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT tarundua modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT thomaskwasa modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT timothydenison modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT tonygodi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT vivianmushi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning AT williammatuja modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning |