Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning

Background: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods:...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniel M. Mwanga, Isaac C. Kipchirchir, George O. Muhua, Charles R. Newton, Damazo T. Kadengye, Abankwah Junior, Albert Akpalu, Arjune Sen, Bruno Mmbando, Cynthia Sottie, Dan Bhwana, Daniel Mtai Mwanga, Daniel Nana Yaw, David McDaid, Dorcas Muli, Emmanuel Darkwa, Frederick Murunga Wekesah, Gershim Asiki, Gergana Manolova, Guillaume Pages, Helen Cross, Henrika Kimambo, Isolide S. Massawe, Josemir W. Sander, Mary Bitta, Mercy Atieno, Neerja Chowdhary, Patrick Adjei, Peter O. Otieno, Ryan Wagner, Richard Walker, Sabina Asiamah, Samuel Iddi, Simone Grassi, Sloan Mahone, Sonia Vallentin, Stella Waruingi, Symon Kariuki, Tarun Dua, Thomas Kwasa, Timothy Denison, Tony Godi, Vivian Mushi, William Matuja
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Global Epidemiology
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S259011332500001X
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586790556401664
author Daniel M. Mwanga
Isaac C. Kipchirchir
George O. Muhua
Charles R. Newton
Damazo T. Kadengye
Abankwah Junior
Albert Akpalu
Arjune Sen
Bruno Mmbando
Charles R. Newton
Cynthia Sottie
Dan Bhwana
Daniel Mtai Mwanga
Damazo T. Kadengye
Daniel Nana Yaw
David McDaid
Dorcas Muli
Emmanuel Darkwa
Frederick Murunga Wekesah
Gershim Asiki
Gergana Manolova
Guillaume Pages
Helen Cross
Henrika Kimambo
Isolide S. Massawe
Josemir W. Sander
Mary Bitta
Mercy Atieno
Neerja Chowdhary
Patrick Adjei
Peter O. Otieno
Ryan Wagner
Richard Walker
Sabina Asiamah
Samuel Iddi
Simone Grassi
Sloan Mahone
Sonia Vallentin
Stella Waruingi
Symon Kariuki
Tarun Dua
Thomas Kwasa
Timothy Denison
Tony Godi
Vivian Mushi
William Matuja
author_facet Daniel M. Mwanga
Isaac C. Kipchirchir
George O. Muhua
Charles R. Newton
Damazo T. Kadengye
Abankwah Junior
Albert Akpalu
Arjune Sen
Bruno Mmbando
Charles R. Newton
Cynthia Sottie
Dan Bhwana
Daniel Mtai Mwanga
Damazo T. Kadengye
Daniel Nana Yaw
David McDaid
Dorcas Muli
Emmanuel Darkwa
Frederick Murunga Wekesah
Gershim Asiki
Gergana Manolova
Guillaume Pages
Helen Cross
Henrika Kimambo
Isolide S. Massawe
Josemir W. Sander
Mary Bitta
Mercy Atieno
Neerja Chowdhary
Patrick Adjei
Peter O. Otieno
Ryan Wagner
Richard Walker
Sabina Asiamah
Samuel Iddi
Simone Grassi
Sloan Mahone
Sonia Vallentin
Stella Waruingi
Symon Kariuki
Tarun Dua
Thomas Kwasa
Timothy Denison
Tony Godi
Vivian Mushi
William Matuja
author_sort Daniel M. Mwanga
collection DOAJ
description Background: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods: All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. Results: Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. Conclusion: These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies.
format Article
id doaj-art-5fa00d27c4f840b39f7abdb8b3a3e986
institution Kabale University
issn 2590-1133
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Global Epidemiology
spelling doaj-art-5fa00d27c4f840b39f7abdb8b3a3e9862025-01-25T04:11:25ZengElsevierGlobal Epidemiology2590-11332025-06-019100183Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learningDaniel M. Mwanga0Isaac C. Kipchirchir1George O. Muhua2Charles R. Newton3Damazo T. Kadengye4Abankwah JuniorAlbert AkpaluArjune SenBruno MmbandoCharles R. NewtonCynthia SottieDan BhwanaDaniel Mtai MwangaDamazo T. KadengyeDaniel Nana YawDavid McDaidDorcas MuliEmmanuel DarkwaFrederick Murunga WekesahGershim AsikiGergana ManolovaGuillaume PagesHelen CrossHenrika KimamboIsolide S. MassaweJosemir W. SanderMary BittaMercy AtienoNeerja ChowdharyPatrick AdjeiPeter O. OtienoRyan WagnerRichard WalkerSabina AsiamahSamuel IddiSimone GrassiSloan MahoneSonia VallentinStella WaruingiSymon KariukiTarun DuaThomas KwasaTimothy DenisonTony GodiVivian MushiWilliam MatujaDepartment of Mathematics, University of Nairobi, Kenya; African Population and Health Research Center, Nairobi, Kenya; Corresponding author at: P.O. Box 10787-00100, Kitisuru, Nairobi, Kenya.Department of Mathematics, University of Nairobi, KenyaDepartment of Mathematics, University of Nairobi, KenyaDepartment of Psychiatry, University of Oxford, United Kingdom; Kenya Medical Research Institute, Wellcome Trust Research Programme, Kilifi, KenyaAfrican Population and Health Research Center, Nairobi, KenyaBackground: Attrition is a challenge in parameter estimation in both longitudinal and multi-stage cross-sectional studies. Here, we examine utility of machine learning to predict attrition and identify associated factors in a two-stage population-based epilepsy prevalence study in Nairobi. Methods: All individuals in the Nairobi Urban Health and Demographic Surveillance System (NUHDSS) (Korogocho and Viwandani) were screened for epilepsy in two stages. Attrition was defined as probable epilepsy cases identified at stage-I but who did not attend stage-II (neurologist assessment). Categorical variables were one-hot encoded, class imbalance was addressed using synthetic minority over-sampling technique (SMOTE) and numeric variables were scaled and centered. The dataset was split into training and testing sets (7:3 ratio), and seven machine learning models, including the ensemble Super Learner, were trained. Hyperparameters were tuned using 10-fold cross-validation, and model performance evaluated using metrics like Area under the curve (AUC), accuracy, Brier score and F1 score over 500 bootstrap samples of the test data. Results: Random forest (AUC = 0.98, accuracy = 0.95, Brier score = 0.06, and F1 = 0.94), extreme gradient boost (XGB) (AUC = 0.96, accuracy = 0.91, Brier score = 0.08, F1 = 0.90) and support vector machine (SVM) (AUC = 0.93, accuracy = 0.93, Brier score = 0.07, F1 = 0.92) were the best performing models (base learners). Ensemble Super Learner had similarly high performance. Important predictors of attrition included proximity to industrial areas, male gender, employment, education, smaller households, and a history of complex partial seizures. Conclusion: These findings can aid researchers plan targeted mobilization for scheduled clinical appointments to improve follow-up rates. These findings will inform development of a web-based algorithm to predict attrition risk and aid in targeted follow-up efforts in similar studies.http://www.sciencedirect.com/science/article/pii/S259011332500001XMachine learningAttritionLoss to follow-upUrban settlementsEpilepsyPrevalence
spellingShingle Daniel M. Mwanga
Isaac C. Kipchirchir
George O. Muhua
Charles R. Newton
Damazo T. Kadengye
Abankwah Junior
Albert Akpalu
Arjune Sen
Bruno Mmbando
Charles R. Newton
Cynthia Sottie
Dan Bhwana
Daniel Mtai Mwanga
Damazo T. Kadengye
Daniel Nana Yaw
David McDaid
Dorcas Muli
Emmanuel Darkwa
Frederick Murunga Wekesah
Gershim Asiki
Gergana Manolova
Guillaume Pages
Helen Cross
Henrika Kimambo
Isolide S. Massawe
Josemir W. Sander
Mary Bitta
Mercy Atieno
Neerja Chowdhary
Patrick Adjei
Peter O. Otieno
Ryan Wagner
Richard Walker
Sabina Asiamah
Samuel Iddi
Simone Grassi
Sloan Mahone
Sonia Vallentin
Stella Waruingi
Symon Kariuki
Tarun Dua
Thomas Kwasa
Timothy Denison
Tony Godi
Vivian Mushi
William Matuja
Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
Global Epidemiology
Machine learning
Attrition
Loss to follow-up
Urban settlements
Epilepsy
Prevalence
title Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
title_full Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
title_fullStr Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
title_full_unstemmed Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
title_short Modeling the determinants of attrition in a two-stage epilepsy prevalence survey in Nairobi using machine learning
title_sort modeling the determinants of attrition in a two stage epilepsy prevalence survey in nairobi using machine learning
topic Machine learning
Attrition
Loss to follow-up
Urban settlements
Epilepsy
Prevalence
url http://www.sciencedirect.com/science/article/pii/S259011332500001X
work_keys_str_mv AT danielmmwanga modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT isaacckipchirchir modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT georgeomuhua modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT charlesrnewton modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT damazotkadengye modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT abankwahjunior modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT albertakpalu modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT arjunesen modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT brunommbando modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT charlesrnewton modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT cynthiasottie modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT danbhwana modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT danielmtaimwanga modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT damazotkadengye modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT danielnanayaw modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT davidmcdaid modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT dorcasmuli modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT emmanueldarkwa modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT frederickmurungawekesah modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT gershimasiki modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT gerganamanolova modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT guillaumepages modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT helencross modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT henrikakimambo modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT isolidesmassawe modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT josemirwsander modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT marybitta modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT mercyatieno modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT neerjachowdhary modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT patrickadjei modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT peterootieno modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT ryanwagner modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT richardwalker modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT sabinaasiamah modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT samueliddi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT simonegrassi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT sloanmahone modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT soniavallentin modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT stellawaruingi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT symonkariuki modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT tarundua modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT thomaskwasa modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT timothydenison modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT tonygodi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT vivianmushi modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning
AT williammatuja modelingthedeterminantsofattritioninatwostageepilepsyprevalencesurveyinnairobiusingmachinelearning