Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis

Introduction and aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingxin Liu, Tsuyoshi Okuhara, Wenbo Huang, Atsushi Ogihara, Hikari Sophia Nagao, Hiroko Okada, Takahiro Kiuchi
Format: Article
Language:English
Published: Elsevier 2025-02-01
Series:International Dental Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0020653924015685
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592800363839488
author Mingxin Liu
Tsuyoshi Okuhara
Wenbo Huang
Atsushi Ogihara
Hikari Sophia Nagao
Hiroko Okada
Takahiro Kiuchi
author_facet Mingxin Liu
Tsuyoshi Okuhara
Wenbo Huang
Atsushi Ogihara
Hikari Sophia Nagao
Hiroko Okada
Takahiro Kiuchi
author_sort Mingxin Liu
collection DOAJ
description Introduction and aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs. Results: Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4’s performance, however, remained consistent across different regions. Conclusion: LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs’ accuracy. The reliance on image-based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis.
format Article
id doaj-art-8bcf859268c340d498ae2fa0f606cf13
institution Kabale University
issn 0020-6539
language English
publishDate 2025-02-01
publisher Elsevier
record_format Article
series International Dental Journal
spelling doaj-art-8bcf859268c340d498ae2fa0f606cf132025-01-21T04:12:47ZengElsevierInternational Dental Journal0020-65392025-02-01751213222Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-AnalysisMingxin Liu0Tsuyoshi Okuhara1Wenbo Huang2Atsushi Ogihara3Hikari Sophia Nagao4Hiroko Okada5Takahiro Kiuchi6Department of Health Communication, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, Japan; Corresponding author. Department of Health Communication, Graduate School of Medicine, The University of Tokyo, Hongo 7-3-1, Bunkyo 1138655, Tokyo, Japan.Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, JapanDepartment of Clinical Epidemiology and Health Economics, School of Public Health, The University of Tokyo, Bunkyo, Tokyo, JapanFaculty of Human Sciences, Waseda University, Tokorozawa, JapanDepartment of Health Communication, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, JapanDepartment of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, JapanDepartment of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, Bunkyo, Tokyo, JapanIntroduction and aims: This study systematically reviews and conducts a meta-analysis to evaluate the performance of various large language models (LLMs) in dental licensing examinations worldwide. The aim is to assess the accuracy of these models in different linguistic and geographical contexts. This will inform their potential application in dental education and diagnostics. Methods: Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we conducted a comprehensive search across PubMed, Web of Science, and Scopus for studies published from 1 January 2022 to 1 May 2024. Two authors independently reviewed the literature based on the inclusion and exclusion criteria, extracted data, and evaluated the quality of the studies in accordance with the Quality Assessment of Diagnostic Accuracy Studies-2. We conducted qualitative and quantitative analyses to evaluate the performance of LLMs. Results: Eleven studies met the inclusion criteria, encompassing dental licensing examinations from eight countries. GPT-3.5, GPT-4, and Bard achieved integrated accuracy rates of 54%, 72%, and 56%, respectively. GPT-4 outperformed GPT-3.5 and Bard, passing more than half of the dental licensing examinations. Subgroup analyses and meta-regression showed that GPT-3.5 performed significantly better in English-speaking countries. GPT-4’s performance, however, remained consistent across different regions. Conclusion: LLMs, particularly GPT-4, show potential in dental education and diagnostics, yet their accuracy remains below the threshold required for clinical application. The lack of sufficient training data in dentistry has affected LLMs’ accuracy. The reliance on image-based diagnostics also presents challenges. As a result, their accuracy in dental exams is lower compared to medical licensing exams. Additionally, LLMs even provide more detailed explanation for incorrect answer than correct one. Overall, the current LLMs are not yet suitable for use in dental education and clinical diagnosis.http://www.sciencedirect.com/science/article/pii/S0020653924015685DentistrySystematic reviewOral medicineDental educationHealthcare
spellingShingle Mingxin Liu
Tsuyoshi Okuhara
Wenbo Huang
Atsushi Ogihara
Hikari Sophia Nagao
Hiroko Okada
Takahiro Kiuchi
Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
International Dental Journal
Dentistry
Systematic review
Oral medicine
Dental education
Healthcare
title Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
title_full Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
title_fullStr Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
title_full_unstemmed Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
title_short Large Language Models in Dental Licensing Examinations: Systematic Review and Meta-Analysis
title_sort large language models in dental licensing examinations systematic review and meta analysis
topic Dentistry
Systematic review
Oral medicine
Dental education
Healthcare
url http://www.sciencedirect.com/science/article/pii/S0020653924015685
work_keys_str_mv AT mingxinliu largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT tsuyoshiokuhara largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT wenbohuang largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT atsushiogihara largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT hikarisophianagao largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT hirokookada largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis
AT takahirokiuchi largelanguagemodelsindentallicensingexaminationssystematicreviewandmetaanalysis