Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.

This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources...

Full description

Saved in:

Bibliographic Details
Main Authors:	Susana Lavado, Eduardo Costa, Niclas F Sturm, Johannes S Tafferner, Octávio Rodrigues, Pedro Pita Barros, Leid Zejnilovic
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0312075
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540185053626368
author	Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic
author_facet	Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic
author_sort	Susana Lavado
collection	DOAJ
description	This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.
format	Article
id	doaj-art-66994a6c08584775b83d7be7810c35ba
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-66994a6c08584775b83d7be7810c35ba2025-02-05T05:32:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031207510.1371/journal.pone.0312075Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.Susana LavadoEduardo CostaNiclas F SturmJohannes S TaffernerOctávio RodriguesPedro Pita BarrosLeid ZejnilovicThis empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.https://doi.org/10.1371/journal.pone.0312075
spellingShingle	Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. PLoS ONE
title	Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_full	Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_fullStr	Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_full_unstemmed	Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_short	Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
title_sort	low cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data an empirical study in portugal
url	https://doi.org/10.1371/journal.pone.0312075
work_keys_str_mv	AT susanalavado lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT eduardocosta lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT niclasfsturm lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT johannesstafferner lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT octaviorodrigues lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT pedropitabarros lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT leidzejnilovic lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal

Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.

Similar Items