Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study
Abstract BackgroundThe capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored. ObjectiveThis study evaluates the confidence levels of 12 LLMs across 5 medical specialt...
Saved in:
| Main Authors: | Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2025-05-01
|
| Series: | JMIR Medical Informatics |
| Online Access: | https://medinform.jmir.org/2025/1/e66917 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Large language models in medicine: A review of current clinical trials across healthcare applications.
by: Mahmud Omar, et al.
Published: (2024-11-01) -
Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning
by: Reem Agbareia, et al.
Published: (2025-05-01) -
Applications of Artificial Intelligence in Vasculitides: A Systematic Review
by: Mahmud Omar, et al.
Published: (2025-03-01) -
Pitfalls of large language models in medical ethics reasoning
by: Shelly Soffer, et al.
Published: (2025-07-01) -
Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers
by: Eden Avnat, et al.
Published: (2025-07-01)