Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis
<b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Diagnostics |
Subjects: | |
Online Access: | https://www.mdpi.com/2075-4418/15/2/221 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588669938040832 |
---|---|
author | Tobias Gehlen Theresa Joost Philipp Solbrig Katharina Stahnke Robert Zahn Markus Jahn Dominik Adl Amini David Alexander Back |
author_facet | Tobias Gehlen Theresa Joost Philipp Solbrig Katharina Stahnke Robert Zahn Markus Jahn Dominik Adl Amini David Alexander Back |
author_sort | Tobias Gehlen |
collection | DOAJ |
description | <b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether the AI-based decision-making of symptom checker apps shows better performance in diagnostic accuracy and urgency assessment compared to physicians remains unclear. Therefore, this study aimed to investigate the performance of existing symptom checker apps in orthopedic and traumatology cases compared to physicians in the field. <b>Methods:</b> 30 fictitious case vignettes of common conditions in trauma surgery and orthopedics were retrospectively examined by four orthopedic and traumatology specialists and four different symptom checker apps for diagnostic accuracy and the recommended urgency of measures. Based on the estimation provided by the doctors and the individual symptom checker apps, the percentage of correct diagnoses and appropriate assessments of treatment urgency was calculated in mean and standard deviation [SD] in [%]. Data were analyzed statistically for accuracy and correlation between the apps and physicians using a nonparametric Spearman’s correlation test (<i>p</i> < 0.05). <b>Results:</b> The physicians provided the correct diagnosis in 84.4 ± 18.4% of cases (range: 53.3 to 96.7%), and the symptom checker apps in 35.8 ± 1.0% of cases (range: 26.7 to 54.2%). The agreement in the accuracy of the diagnoses varied from low to high (Physicians vs. Physicians: Spearman’s ρ: 0.143 to 0.538; Physicians vs. Apps: Spearman’s ρ: 0.007 to 0.358) depending on the different physicians and apps. In relation to the whole population, the physicians correctly assessed the urgency level in 70.0 ± 4.7% (range: 66.7 to 73.3%) and the apps in 20.6 ± 5.6% (range: 10.8 to 37.5%) of cases. The agreement on the accuracy of estimating urgency levels was moderate to high between and within physicians and individual apps. <b>Conclusions:</b> AI-based symptom checker apps for diagnosis in orthopedics and traumatology do not yet provide a more accurate analysis regarding diagnosis and urgency evaluation than physicians. However, there is a broad variation in the accuracy between different digital tools. Altogether, this field of AI application shows excellent potential and should be further examined in future studies. |
format | Article |
id | doaj-art-09387dcb8a524c29b66c81cc8248ba68 |
institution | Kabale University |
issn | 2075-4418 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Diagnostics |
spelling | doaj-art-09387dcb8a524c29b66c81cc8248ba682025-01-24T13:29:09ZengMDPI AGDiagnostics2075-44182025-01-0115222110.3390/diagnostics15020221Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer AnalysisTobias Gehlen0Theresa Joost1Philipp Solbrig2Katharina Stahnke3Robert Zahn4Markus Jahn5Dominik Adl Amini6David Alexander Back7Center for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanySports Medicine & Sports Orthopedics, University Outpatient Clinic, University of Potsdam, 14469 Potsdam, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, GermanyCenter for Musculoskeletal Surgery, Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, 13353 Berlin, Germany<b>Background and Objective:</b> The rapid development of artificial intelligence (AI) is impacting the medical sector by offering new possibilities for faster and more accurate diagnoses. Symptom checker apps show potential for supporting patient decision-making in this regard. Whether the AI-based decision-making of symptom checker apps shows better performance in diagnostic accuracy and urgency assessment compared to physicians remains unclear. Therefore, this study aimed to investigate the performance of existing symptom checker apps in orthopedic and traumatology cases compared to physicians in the field. <b>Methods:</b> 30 fictitious case vignettes of common conditions in trauma surgery and orthopedics were retrospectively examined by four orthopedic and traumatology specialists and four different symptom checker apps for diagnostic accuracy and the recommended urgency of measures. Based on the estimation provided by the doctors and the individual symptom checker apps, the percentage of correct diagnoses and appropriate assessments of treatment urgency was calculated in mean and standard deviation [SD] in [%]. Data were analyzed statistically for accuracy and correlation between the apps and physicians using a nonparametric Spearman’s correlation test (<i>p</i> < 0.05). <b>Results:</b> The physicians provided the correct diagnosis in 84.4 ± 18.4% of cases (range: 53.3 to 96.7%), and the symptom checker apps in 35.8 ± 1.0% of cases (range: 26.7 to 54.2%). The agreement in the accuracy of the diagnoses varied from low to high (Physicians vs. Physicians: Spearman’s ρ: 0.143 to 0.538; Physicians vs. Apps: Spearman’s ρ: 0.007 to 0.358) depending on the different physicians and apps. In relation to the whole population, the physicians correctly assessed the urgency level in 70.0 ± 4.7% (range: 66.7 to 73.3%) and the apps in 20.6 ± 5.6% (range: 10.8 to 37.5%) of cases. The agreement on the accuracy of estimating urgency levels was moderate to high between and within physicians and individual apps. <b>Conclusions:</b> AI-based symptom checker apps for diagnosis in orthopedics and traumatology do not yet provide a more accurate analysis regarding diagnosis and urgency evaluation than physicians. However, there is a broad variation in the accuracy between different digital tools. Altogether, this field of AI application shows excellent potential and should be further examined in future studies.https://www.mdpi.com/2075-4418/15/2/221orthopedicstraumatologychatbotsartificial intelligencesymptomsmobile health |
spellingShingle | Tobias Gehlen Theresa Joost Philipp Solbrig Katharina Stahnke Robert Zahn Markus Jahn Dominik Adl Amini David Alexander Back Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis Diagnostics orthopedics traumatology chatbots artificial intelligence symptoms mobile health |
title | Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis |
title_full | Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis |
title_fullStr | Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis |
title_full_unstemmed | Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis |
title_short | Accuracy of Artificial Intelligence Based Chatbots in Analyzing Orthopedic Pathologies: An Experimental Multi-Observer Analysis |
title_sort | accuracy of artificial intelligence based chatbots in analyzing orthopedic pathologies an experimental multi observer analysis |
topic | orthopedics traumatology chatbots artificial intelligence symptoms mobile health |
url | https://www.mdpi.com/2075-4418/15/2/221 |
work_keys_str_mv | AT tobiasgehlen accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT theresajoost accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT philippsolbrig accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT katharinastahnke accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT robertzahn accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT markusjahn accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT dominikadlamini accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis AT davidalexanderback accuracyofartificialintelligencebasedchatbotsinanalyzingorthopedicpathologiesanexperimentalmultiobserveranalysis |