Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis

<b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether...

Full description

Saved in:
Bibliographic Details
Main Authors: Tobias Gehlen, Theresa Joost, Philipp Solbrig, Katharina Stahnke, Robert Zahn, Markus Jahn, Dominik Adl Amini, David Alexander Back
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Diagnostics
Subjects:
Online Access:https://www.mdpi.com/2075-4418/15/2/221
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588669938040832
author Tobias Gehlen
Theresa Joost
Philipp Solbrig
Katharina Stahnke
Robert Zahn
Markus Jahn
Dominik Adl Amini
David Alexander Back
author_facet Tobias Gehlen
Theresa Joost
Philipp Solbrig
Katharina Stahnke
Robert Zahn
Markus Jahn
Dominik Adl Amini
David Alexander Back
author_sort Tobias Gehlen
collection DOAJ
description <b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether the AI-based decision-making of symptom checker apps shows better performance in diagnostic accuracy and urgency assessment compared to physicians remains unclear. Therefore, this study aimed to investigate the performance of existing symptom checker apps in orthopedic and traumatology cases compared to physicians in the field. <b>Methods:</b> 30 fictitious case vignettes of common conditions in trauma surgery and orthopedics were retrospectively examined by four orthopedic and traumatology specialists and four different symptom checker apps for diagnostic accuracy and the recommended urgency of measures. Based on the estimation provided by the doctors and the individual symptom checker apps, the percentage of correct diagnoses and appropriate assessments of treatment urgency was calculated in mean and standard deviation [SD] in [%]. Data were analyzed statistically for accuracy and correlation between the apps and physicians using a nonparametric Spearman’s correlation test (<i>p</i> < 0.05). <b>Results:</b> The physicians provided the correct diagnosis in 84.4 ± 18.4% of cases (range: 53.3 to 96.7%), and the symptom checker apps in 35.8 ± 1.0% of cases (range: 26.7 to 54.2%). The agreement in the accuracy of the diagnoses varied from low to high (Physicians vs. Physicians: Spearman’s ρ: 0.143 to 0.538; Physicians vs. Apps: Spearman’s ρ: 0.007 to 0.358) depending on the different physicians and apps. In relation to the whole population, the physicians correctly assessed the urgency level in 70.0 ± 4.7% (range: 66.7 to 73.3%) and the apps in 20.6 ± 5.6% (range: 10.8 to 37.5%) of cases. The agreement on the accuracy of estimating urgency levels was moderate to high between and within physicians and individual apps. <b>Conclusions:</b> AI-based symptom checker apps for diagnosis in orthopedics and traumatology do not yet provide a more accurate analysis regarding diagnosis and urgency evaluation than physicians. However, there is a broad variation in the accuracy between different digital tools. Altogether, this field of AI application shows excellent potential and should be further examined in future studies.
format Article
id doaj-art-09387dcb8a524c29b66c81cc8248ba68
institution Kabale University
issn 2075-4418
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Diagnostics
spelling doaj-art-09387dcb8a524c29b66c81cc8248ba682025-01-24T13:29:09ZengMDPI AGDiagnostics2075-44182025-01-0115222110.3390/diagnostics15020221Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer AnalysisTobias Gehlen0Theresa Joost1Philipp Solbrig2Katharina Stahnke3Robert Zahn4Markus Jahn5Dominik Adl Amini6David Alexander Back7Center for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanySports Medicine & Sports Orthopedics, University Outpatient Clinic, University of Potsdam, 14469 Potsdam, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany<b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether the AI-based decision-making of symptom checker apps shows better performance in diagnostic accuracy and urgency assessment compared to physicians remains unclear. Therefore, this study aimed to investigate the performance of existing symptom checker apps in orthopedic and traumatology cases compared to physicians in the field. <b>Methods:</b> 30 fictitious case vignettes of common conditions in trauma surgery and orthopedics were retrospectively examined by four orthopedic and traumatology specialists and four different symptom checker apps for diagnostic accuracy and the recommended urgency of measures. Based on the estimation provided by the doctors and the individual symptom checker apps, the percentage of correct diagnoses and appropriate assessments of treatment urgency was calculated in mean and standard deviation [SD] in [%]. Data were analyzed statistically for accuracy and correlation between the apps and physicians using a nonparametric Spearman’s correlation test (<i>p</i> < 0.05). <b>Results:</b> The physicians provided the correct diagnosis in 84.4 ± 18.4% of cases (range: 53.3 to 96.7%), and the symptom checker apps in 35.8 ± 1.0% of cases (range: 26.7 to 54.2%). The agreement in the accuracy of the diagnoses varied from low to high (Physicians vs. Physicians: Spearman’s ρ: 0.143 to 0.538; Physicians vs. Apps: Spearman’s ρ: 0.007 to 0.358) depending on the different physicians and apps. In relation to the whole population, the physicians correctly assessed the urgency level in 70.0 ± 4.7% (range: 66.7 to 73.3%) and the apps in 20.6 ± 5.6% (range: 10.8 to 37.5%) of cases. The agreement on the accuracy of estimating urgency levels was moderate to high between and within physicians and individual apps. <b>Conclusions:</b> AI-based symptom checker apps for diagnosis in orthopedics and traumatology do not yet provide a more accurate analysis regarding diagnosis and urgency evaluation than physicians. However, there is a broad variation in the accuracy between different digital tools. Altogether, this field of AI application shows excellent potential and should be further examined in future studies.https://www.mdpi.com/2075-4418/15/2/221orthopedicstraumatologychatbotsartificial intelligencesymptomsmobile health
spellingShingle Tobias Gehlen
Theresa Joost
Philipp Solbrig
Katharina Stahnke
Robert Zahn
Markus Jahn
Dominik Adl Amini
David Alexander Back
Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
Diagnostics
orthopedics
traumatology
chatbots
artificial intelligence
symptoms
mobile health
title Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
title_full Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
title_fullStr Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
title_full_unstemmed Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
title_short Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
title_sort accuracy of artificial intelligence based chatbots in analyzing orthopedic pathologies an experimental multi observer analysis
topic orthopedics
traumatology
chatbots
artificial intelligence
symptoms
mobile health
url https://www.mdpi.com/2075-4418/15/2/221
work_keys_str_mv AT tobiasgehlen accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT theresajoost accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT philippsolbrig accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT katharinastahnke accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT robertzahn accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT markusjahn accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT dominikadlamini accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis
AT davidalexanderback accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis