Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis

Background: In recent years, there has been remarkable growth in AI-based applications in healthcare, with a significant breakthrough marked by the launch of large language models (LLMs) such as ChatGPT and Google Bard. Patients and health professional students commonly utilize these models due to t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Farraj Albalawi, Sanjeev B. Khanagar, Kiran Iyer, Nora Alhazmi, Afnan Alayyash, Anwar S. Alhazmi, Mohammed Awawdeh, Oinam Gokulchandra Singh
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	artificial intelligence deep learning machine learning large language models orthodontics clear aligners
Online Access:	https://www.mdpi.com/2076-3417/15/2/893
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589228640305152
author	Farraj Albalawi Sanjeev B. Khanagar Kiran Iyer Nora Alhazmi Afnan Alayyash Anwar S. Alhazmi Mohammed Awawdeh Oinam Gokulchandra Singh
author_facet	Farraj Albalawi Sanjeev B. Khanagar Kiran Iyer Nora Alhazmi Afnan Alayyash Anwar S. Alhazmi Mohammed Awawdeh Oinam Gokulchandra Singh
author_sort	Farraj Albalawi
collection	DOAJ
description	Background: In recent years, there has been remarkable growth in AI-based applications in healthcare, with a significant breakthrough marked by the launch of large language models (LLMs) such as ChatGPT and Google Bard. Patients and health professional students commonly utilize these models due to their accessibility. The increasing use of LLMs in healthcare necessitates an evaluation of their ability to generate accurate and reliable responses. Objective: This study assessed the performance of LLMs in answering orthodontic-related queries through a systematic review and meta-analysis. Methods: A comprehensive search of PubMed, Web of Science, Embase, Scopus, and Google Scholar was conducted up to 31 October 2024. The quality of the included studies was evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST), and R Studio software (Version 4.4.0) was employed for meta-analysis and heterogeneity assessment. Results: Out of 278 retrieved articles, 10 studies were included. The most commonly used LLM was ChatGPT (10/10, 100% of papers), followed by Google’s Bard/Gemini (3/10, 30% of papers), and Microsoft’s Bing/Copilot AI (2/10, 20% of papers). Accuracy was primarily evaluated using Likert scales, while the DISCERN tool was frequently applied for reliability assessment. The meta-analysis indicated that the LLMs, such as ChatGPT-4 and other models, do not significantly differ in generating responses to queries related to the specialty of orthodontics. The forest plot revealed a Standard Mean Deviation of 0.01 [CI: 0.42–0.44]. No heterogeneity was observed between the experimental group (ChatGPT-3.5, Gemini, and Copilot) and the control group (ChatGPT-4). However, most studies exhibited a high PROBAST risk of bias due to the lack of standardized evaluation tools. Conclusions: ChatGPT-4 has been extensively used for a variety of tasks and has demonstrated advanced and encouraging outcomes compared to other LLMs, and thus can be regarded as a valuable tool for enhancing educational and learning experiences. While LLMs can generate comprehensive responses, their reliability is compromised by the absence of peer-reviewed references, necessitating expert oversight in healthcare applications.
format	Article
id	doaj-art-f85d400e17734677b9db3ca775643557
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-f85d400e17734677b9db3ca7756435572025-01-24T13:21:14ZengMDPI AGApplied Sciences2076-34172025-01-0115289310.3390/app15020893Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-AnalysisFarraj Albalawi0Sanjeev B. Khanagar1Kiran Iyer2Nora Alhazmi3Afnan Alayyash4Anwar S. Alhazmi5Mohammed Awawdeh6Oinam Gokulchandra Singh7Preventive Dental Science Department, College of Dentistry, King Saud Bin Abdulaziz University for Health Sciences, Riyadh 11426, Saudi ArabiaPreventive Dental Science Department, College of Dentistry, King Saud Bin Abdulaziz University for Health Sciences, Riyadh 11426, Saudi ArabiaPreventive Dental Science Department, College of Dentistry, King Saud Bin Abdulaziz University for Health Sciences, Riyadh 11426, Saudi ArabiaPreventive Dental Science Department, College of Dentistry, King Saud Bin Abdulaziz University for Health Sciences, Riyadh 11426, Saudi ArabiaDepartment of Preventive Dentistry, College of Dentistry, Jouf University, Sakaka 72345, Saudi ArabiaDepartment of Preventive Dentistry, College of Dentistry, Jazan University, Jazan 45142, Saudi ArabiaPreventive Dental Science Department, College of Dentistry, King Saud Bin Abdulaziz University for Health Sciences, Riyadh 11426, Saudi ArabiaKing Abdullah International Medical Research Center, Ministry of National Guard Health Affairs, Riyadh 11481, Saudi ArabiaBackground: In recent years, there has been remarkable growth in AI-based applications in healthcare, with a significant breakthrough marked by the launch of large language models (LLMs) such as ChatGPT and Google Bard. Patients and health professional students commonly utilize these models due to their accessibility. The increasing use of LLMs in healthcare necessitates an evaluation of their ability to generate accurate and reliable responses. Objective: This study assessed the performance of LLMs in answering orthodontic-related queries through a systematic review and meta-analysis. Methods: A comprehensive search of PubMed, Web of Science, Embase, Scopus, and Google Scholar was conducted up to 31 October 2024. The quality of the included studies was evaluated using the Prediction model Risk of Bias Assessment Tool (PROBAST), and R Studio software (Version 4.4.0) was employed for meta-analysis and heterogeneity assessment. Results: Out of 278 retrieved articles, 10 studies were included. The most commonly used LLM was ChatGPT (10/10, 100% of papers), followed by Google’s Bard/Gemini (3/10, 30% of papers), and Microsoft’s Bing/Copilot AI (2/10, 20% of papers). Accuracy was primarily evaluated using Likert scales, while the DISCERN tool was frequently applied for reliability assessment. The meta-analysis indicated that the LLMs, such as ChatGPT-4 and other models, do not significantly differ in generating responses to queries related to the specialty of orthodontics. The forest plot revealed a Standard Mean Deviation of 0.01 [CI: 0.42–0.44]. No heterogeneity was observed between the experimental group (ChatGPT-3.5, Gemini, and Copilot) and the control group (ChatGPT-4). However, most studies exhibited a high PROBAST risk of bias due to the lack of standardized evaluation tools. Conclusions: ChatGPT-4 has been extensively used for a variety of tasks and has demonstrated advanced and encouraging outcomes compared to other LLMs, and thus can be regarded as a valuable tool for enhancing educational and learning experiences. While LLMs can generate comprehensive responses, their reliability is compromised by the absence of peer-reviewed references, necessitating expert oversight in healthcare applications.https://www.mdpi.com/2076-3417/15/2/893artificial intelligencedeep learningmachine learninglarge language modelsorthodonticsclear aligners
spellingShingle	Farraj Albalawi Sanjeev B. Khanagar Kiran Iyer Nora Alhazmi Afnan Alayyash Anwar S. Alhazmi Mohammed Awawdeh Oinam Gokulchandra Singh Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis Applied Sciences artificial intelligence deep learning machine learning large language models orthodontics clear aligners
title	Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis
title_full	Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis
title_fullStr	Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis
title_full_unstemmed	Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis
title_short	Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis
title_sort	evaluating the performance of artificial intelligence based large language models in orthodontics a systematic review and meta analysis
topic	artificial intelligence deep learning machine learning large language models orthodontics clear aligners
url	https://www.mdpi.com/2076-3417/15/2/893
work_keys_str_mv	AT farrajalbalawi evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT sanjeevbkhanagar evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT kiraniyer evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT noraalhazmi evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT afnanalayyash evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT anwarsalhazmi evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT mohammedawawdeh evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis AT oinamgokulchandrasingh evaluatingtheperformanceofartificialintelligencebasedlargelanguagemodelsinorthodonticsasystematicreviewandmetaanalysis

Evaluating the Performance of Artificial Intelligence-Based Large Language Models in Orthodontics—A Systematic Review and Meta-Analysis

Similar Items