Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data

Objectives Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health...

Full description

Saved in:
Bibliographic Details
Main Authors: Johnny Downs, Robert Stewart, Alice Wickersham, Sumithra Velupillai, Lucile Ter-Minassian, Natalia Viani, Lauren Cross
Format: Article
Language:English
Published: BMJ Publishing Group 2022-12-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/12/12/e058058.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832573912456626176
author Johnny Downs
Robert Stewart
Alice Wickersham
Sumithra Velupillai
Lucile Ter-Minassian
Natalia Viani
Lauren Cross
author_facet Johnny Downs
Robert Stewart
Alice Wickersham
Sumithra Velupillai
Lucile Ter-Minassian
Natalia Viani
Lauren Cross
author_sort Johnny Downs
collection DOAJ
description Objectives Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD.Design Retrospective population cohort study.Setting South London (2007–2013).Participants n=56 258 pupils with linked education and health data.Primary outcome measures Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm.Results Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy.Conclusions ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing ‘fairness weighting’ attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.
format Article
id doaj-art-9838b0378d0543a8b7cd45ec2303a585
institution Kabale University
issn 2044-6055
language English
publishDate 2022-12-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj-art-9838b0378d0543a8b7cd45ec2303a5852025-02-02T02:10:13ZengBMJ Publishing GroupBMJ Open2044-60552022-12-01121210.1136/bmjopen-2021-058058Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare dataJohnny Downs0Robert Stewart1Alice Wickersham2Sumithra Velupillai3Lucile Ter-Minassian4Natalia Viani5Lauren Cross63 South London and Maudsley NHS Foundation Trust, NIHR Maudsley Biomedical Research Centre, London, UKInstitute of Psychiatry, Psychology and Neuroscience, King`s College London, London, UKCAMHS Digital Lab, Department of Child and Adolescent Psychiatry, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, UKPsychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King`s College London, London, UKDepartment of Psychological Medicine, King’s College London, London, UKDepartment of Psychological Medicine, King’s College London, London, UKDepartment of Psychological Medicine, King’s College London, London, UKObjectives Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD.Design Retrospective population cohort study.Setting South London (2007–2013).Participants n=56 258 pupils with linked education and health data.Primary outcome measures Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm.Results Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy.Conclusions ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing ‘fairness weighting’ attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.https://bmjopen.bmj.com/content/12/12/e058058.full
spellingShingle Johnny Downs
Robert Stewart
Alice Wickersham
Sumithra Velupillai
Lucile Ter-Minassian
Natalia Viani
Lauren Cross
Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
BMJ Open
title Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_full Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_fullStr Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_full_unstemmed Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_short Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data
title_sort assessing machine learning for fair prediction of adhd in school pupils using a retrospective cohort study of linked education and healthcare data
url https://bmjopen.bmj.com/content/12/12/e058058.full
work_keys_str_mv AT johnnydowns assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT robertstewart assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT alicewickersham assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT sumithravelupillai assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT lucileterminassian assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT nataliaviani assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata
AT laurencross assessingmachinelearningforfairpredictionofadhdinschoolpupilsusingaretrospectivecohortstudyoflinkededucationandhealthcaredata