Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.
This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2025-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0312075 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540185053626368 |
---|---|
author | Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic |
author_facet | Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic |
author_sort | Susana Lavado |
collection | DOAJ |
description | This empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary. |
format | Article |
id | doaj-art-66994a6c08584775b83d7be7810c35ba |
institution | Kabale University |
issn | 1932-6203 |
language | English |
publishDate | 2025-01-01 |
publisher | Public Library of Science (PLoS) |
record_format | Article |
series | PLoS ONE |
spelling | doaj-art-66994a6c08584775b83d7be7810c35ba2025-02-05T05:32:14ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01201e031207510.1371/journal.pone.0312075Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal.Susana LavadoEduardo CostaNiclas F SturmJohannes S TaffernerOctávio RodriguesPedro Pita BarrosLeid ZejnilovicThis empirical study assessed the potential of developing a machine-learning model to identify children and adolescents with poor oral health using only self-reported survey data. Such a model could enable scalable and cost-effective screening and targeted interventions, optimizing limited resources to improve oral health outcomes. To train and test the model, we used data from 2,133 students attending schools in a Portuguese municipality. Poor oral health (the dependent variable) was defined as having a Decayed, Missing, and Filled Teeth index for deciduous teeth (dmft) or permanent teeth (DMFT) above expert-defined thresholds (dmft/DMFT ≥ 3 or 4). The survey provided information about the students' oral health habits, knowledge, beliefs, and food and physical activity habits, which served as independent variables. Logistic regression models with variables selected through low-variance filtering and recursive feature elimination outperformed various others trained with complex machine learning algorithms based on precision@k metric, outperforming also random selection and expert rule-based models in identifying students with poor oral health. The proposed models are inherently explainable, broadly applicable, which given the context, could compensate their lower performance (Area Under the Curve = 0.64-0.70) compared to similar approaches and models. This study is one of the few in oral health care that includes bias auditing of classification models. The audit surfaced potential biases related to demographic factors such as age and social assistance status. Addressing these biases without significantly compromising model performance remains a challenge. The results confirm the feasibility of survey-based machine learning models for identifying individuals with poor oral health, but further validation of this approach and pilot testing in field trials are necessary.https://doi.org/10.1371/journal.pone.0312075 |
spellingShingle | Susana Lavado Eduardo Costa Niclas F Sturm Johannes S Tafferner Octávio Rodrigues Pedro Pita Barros Leid Zejnilovic Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. PLoS ONE |
title | Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. |
title_full | Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. |
title_fullStr | Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. |
title_full_unstemmed | Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. |
title_short | Low-cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data: An empirical study in Portugal. |
title_sort | low cost and scalable machine learning model for identifying children and adolescents with poor oral health using survey data an empirical study in portugal |
url | https://doi.org/10.1371/journal.pone.0312075 |
work_keys_str_mv | AT susanalavado lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT eduardocosta lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT niclasfsturm lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT johannesstafferner lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT octaviorodrigues lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT pedropitabarros lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal AT leidzejnilovic lowcostandscalablemachinelearningmodelforidentifyingchildrenandadolescentswithpoororalhealthusingsurveydataanempiricalstudyinportugal |