Application of machine learning in early childhood development research: a scoping review

Background Early childhood development (ECD) lays the foundation for lifelong health, academic success and social well-being, yet over 250 million children in low- and middle-income countries are at risk of not reaching their developmental potential. Traditional measures fail to fully capture the ri...

Full description

Saved in:
Bibliographic Details
Main Authors: Akbar K Waljee, Amina Abubakar, Patrick N Mwangala, Faith Neema Benson, Daisy Chelangat, Willie Brink, Cheryl A Moyer
Format: Article
Language:English
Published: BMJ Publishing Group 2025-08-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/15/8/e100358.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850079309708918784
author Akbar K Waljee
Amina Abubakar
Patrick N Mwangala
Faith Neema Benson
Daisy Chelangat
Willie Brink
Cheryl A Moyer
author_facet Akbar K Waljee
Amina Abubakar
Patrick N Mwangala
Faith Neema Benson
Daisy Chelangat
Willie Brink
Cheryl A Moyer
author_sort Akbar K Waljee
collection DOAJ
description Background Early childhood development (ECD) lays the foundation for lifelong health, academic success and social well-being, yet over 250 million children in low- and middle-income countries are at risk of not reaching their developmental potential. Traditional measures fail to fully capture the risks associated with a child’s development outcomes. Artificial intelligence techniques, particularly machine learning (ML), offer an innovative approach by analysing complex datasets to detect subtle developmental patterns.Objective To map the existing literature on the use of ML in ECD research, including its geographical distribution, to identify research gaps and inform future directions. The review focuses on applied ML techniques, data types, feature sets, outcomes, data splitting and validation strategies, model performance, model explainability, key themes, clinical relevance and reported limitations.Design Scoping review using the Arksey and O‘Malley framework with enhancements by Levac et al.Data sources A systematic search was conducted on 16 June 2024 across PubMed, Web of Science, IEEE Xplore and PsycINFO, supplemented by grey literature (OpenGrey) and reference hand-searching. No publication date limits were applied.Eligibility criteria Included studies applied ML or its variants (eg, deep learning (DL), natural language processing) to developmental outcomes in children aged 0–8 years. Studies were in English and addressed cognitive, language, motor or social-emotional development. Excluded were studies focusing on robotics; neurodevelopmental disorders such as autism spectrum disorder, attention-deficit/hyperactivity disorder and communication disorders; disease or medical conditions; and review articles.Data extraction and charting Three reviewers independently extracted data using a structured MS Excel template, covering study ML techniques, data types, feature sets, outcomes, outcome measures, data splitting and validation strategies, model performance, model explainability, key themes, clinical relevance and limitations. A narrative synthesis was conducted, supported by descriptive statistics and visualisations.Results Of the 759 articles retrieved, 27 met the inclusion criteria. Most studies (78%) originated from high-income countries, with none from sub-Saharan Africa. Supervised ML classifiers (40.7%) and DL techniques (22.2%) were the most used approaches. Cognitive development was the most frequently targeted outcome (33.3%), often measured using the Bayley Scales of Infant and Toddler Development-III (33.3%). Data types varied, with image, video and sensor-based data being most prevalent. Key predictive features were grouped into six categories: brain features; anthropometric and clinical/biological markers; socio-demographic and environmental factors; medical history and nutritional indicators; linguistic and expressive features; and motor indicators. Most studies (74.1%) focused solely on prediction, with the majority conducting predictions at age 2 years and above. Only 41% of studies employed explainability methods, and validation strategies varied widely. Few studies (7.4%) conducted external validation, and only one had progressed to a clinical trial. Common limitations included small sample sizes, lack of external validation and imbalanced datasets.Conclusion There is growing interest in using ML for ECD research, but current research lacks geographical diversity, external validation, explainability and practical implementation. Future work should focus on developing inclusive, interpretable and externally validated models that are integrated into real-world implementation.
format Article
id doaj-art-d2d91ca8486945dbbf86872e09b2c33c
institution DOAJ
issn 2044-6055
language English
publishDate 2025-08-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj-art-d2d91ca8486945dbbf86872e09b2c33c2025-08-20T02:45:15ZengBMJ Publishing GroupBMJ Open2044-60552025-08-0115810.1136/bmjopen-2025-100358Application of machine learning in early childhood development research: a scoping reviewAkbar K Waljee0Amina Abubakar1Patrick N Mwangala2Faith Neema Benson3Daisy Chelangat4Willie Brink5Cheryl A Moyer63 Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USA1 Institute for Human Development, The Aga Khan University, Nairobi, Kenya1 Institute for Human Development, The Aga Khan University, Nairobi, Kenya1 Institute for Human Development, The Aga Khan University, Nairobi, Kenya1 Institute for Human Development, The Aga Khan University, Nairobi, Kenya2 Department of Mathematical Sciences, Stellenbosch University, Stellenbosch, South Africa3 Department of Learning Health Sciences, University of Michigan, Ann Arbor, Michigan, USABackground Early childhood development (ECD) lays the foundation for lifelong health, academic success and social well-being, yet over 250 million children in low- and middle-income countries are at risk of not reaching their developmental potential. Traditional measures fail to fully capture the risks associated with a child’s development outcomes. Artificial intelligence techniques, particularly machine learning (ML), offer an innovative approach by analysing complex datasets to detect subtle developmental patterns.Objective To map the existing literature on the use of ML in ECD research, including its geographical distribution, to identify research gaps and inform future directions. The review focuses on applied ML techniques, data types, feature sets, outcomes, data splitting and validation strategies, model performance, model explainability, key themes, clinical relevance and reported limitations.Design Scoping review using the Arksey and O‘Malley framework with enhancements by Levac et al.Data sources A systematic search was conducted on 16 June 2024 across PubMed, Web of Science, IEEE Xplore and PsycINFO, supplemented by grey literature (OpenGrey) and reference hand-searching. No publication date limits were applied.Eligibility criteria Included studies applied ML or its variants (eg, deep learning (DL), natural language processing) to developmental outcomes in children aged 0–8 years. Studies were in English and addressed cognitive, language, motor or social-emotional development. Excluded were studies focusing on robotics; neurodevelopmental disorders such as autism spectrum disorder, attention-deficit/hyperactivity disorder and communication disorders; disease or medical conditions; and review articles.Data extraction and charting Three reviewers independently extracted data using a structured MS Excel template, covering study ML techniques, data types, feature sets, outcomes, outcome measures, data splitting and validation strategies, model performance, model explainability, key themes, clinical relevance and limitations. A narrative synthesis was conducted, supported by descriptive statistics and visualisations.Results Of the 759 articles retrieved, 27 met the inclusion criteria. Most studies (78%) originated from high-income countries, with none from sub-Saharan Africa. Supervised ML classifiers (40.7%) and DL techniques (22.2%) were the most used approaches. Cognitive development was the most frequently targeted outcome (33.3%), often measured using the Bayley Scales of Infant and Toddler Development-III (33.3%). Data types varied, with image, video and sensor-based data being most prevalent. Key predictive features were grouped into six categories: brain features; anthropometric and clinical/biological markers; socio-demographic and environmental factors; medical history and nutritional indicators; linguistic and expressive features; and motor indicators. Most studies (74.1%) focused solely on prediction, with the majority conducting predictions at age 2 years and above. Only 41% of studies employed explainability methods, and validation strategies varied widely. Few studies (7.4%) conducted external validation, and only one had progressed to a clinical trial. Common limitations included small sample sizes, lack of external validation and imbalanced datasets.Conclusion There is growing interest in using ML for ECD research, but current research lacks geographical diversity, external validation, explainability and practical implementation. Future work should focus on developing inclusive, interpretable and externally validated models that are integrated into real-world implementation.https://bmjopen.bmj.com/content/15/8/e100358.full
spellingShingle Akbar K Waljee
Amina Abubakar
Patrick N Mwangala
Faith Neema Benson
Daisy Chelangat
Willie Brink
Cheryl A Moyer
Application of machine learning in early childhood development research: a scoping review
BMJ Open
title Application of machine learning in early childhood development research: a scoping review
title_full Application of machine learning in early childhood development research: a scoping review
title_fullStr Application of machine learning in early childhood development research: a scoping review
title_full_unstemmed Application of machine learning in early childhood development research: a scoping review
title_short Application of machine learning in early childhood development research: a scoping review
title_sort application of machine learning in early childhood development research a scoping review
url https://bmjopen.bmj.com/content/15/8/e100358.full
work_keys_str_mv AT akbarkwaljee applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT aminaabubakar applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT patricknmwangala applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT faithneemabenson applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT daisychelangat applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT williebrink applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview
AT cherylamoyer applicationofmachinelearninginearlychildhooddevelopmentresearchascopingreview