Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.

This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources...

Full description

Saved in:
Bibliographic Details
Main Authors: Susana Lavado, Eduardo Costa, Niclas F Sturm, Johannes S Tafferner, Octávio Rodrigues, Pedro Pita Barros, Leid Zejnilovic
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0312075
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540185053626368
author Susana Lavado
Eduardo Costa
Niclas F Sturm
Johannes S Tafferner
Octávio Rodrigues
Pedro Pita Barros
Leid Zejnilovic
author_facet Susana Lavado
Eduardo Costa
Niclas F Sturm
Johannes S Tafferner
Octávio Rodrigues
Pedro Pita Barros
Leid Zejnilovic
author_sort Susana Lavado
collection DOAJ
description This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.
format Article
id doaj-art-66994a6c08584775b83d7be7810c35ba
institution Kabale University
issn 1932-6203
language English
publishDate 2025-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS ONE
spelling doaj-art-66994a6c08584775b83d7be7810c35ba2025-02-05T05:32:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031207510.1371/journal.pone.0312075Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.Susana LavadoEduardo CostaNiclas F SturmJohannes S TaffernerOctávio RodriguesPedro Pita BarrosLeid ZejnilovicThis empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.https://doi.org/10.1371/journal.pone.0312075
spellingShingle Susana Lavado
Eduardo Costa
Niclas F Sturm
Johannes S Tafferner
Octávio Rodrigues
Pedro Pita Barros
Leid Zejnilovic
Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
PLoS ONE
title Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_full Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_fullStr Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_full_unstemmed Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_short Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_sort low cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data an empirical study in portugal
url https://doi.org/10.1371/journal.pone.0312075
work_keys_str_mv AT susanalavado lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT eduardocosta lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT niclasfsturm lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT johannesstafferner lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT octaviorodrigues lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT pedropitabarros lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal
AT leidzejnilovic lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal