Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Abstract BackgroundThe capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored. ObjectiveThis study evaluates the confidence levels of 12 LLMs across 5 medical specialt...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang
Format:	Article
Language:	English
Published:	JMIR Publications 2025-05-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2025/1/e66917
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Large language models in medicine: A review of current clinical trials across healthcare applications.
by: Mahmud Omar, et al.
Published: (2024-11-01)

Multimodal LLMs for retinal disease diagnosis via OCT: few-shot versus single-shot learning
by: Reem Agbareia, et al.
Published: (2025-05-01)

Applications of Artificial Intelligence in Vasculitides: A Systematic Review
by: Mahmud Omar, et al.
Published: (2025-03-01)

Pitfalls of large language models in medical ethics reasoning
by: Shelly Soffer, et al.
Published: (2025-07-01)

Performance of Large Language Models in Numerical Versus Semantic Medical Knowledge: Cross-Sectional Benchmarking Study on Evidence-Based Questions and Answers
by: Eden Avnat, et al.
Published: (2025-07-01)

P322: The role of large language models in medical genetics
by: Rona Merdler-Rabinowicz, et al.
Published: (2025-01-01)

Benchmarking OpenAI's APIs and Large Language Models for Repeatable, Efficient Question Answering Across Multiple Documents
by: Elena Filipovska, et al.
Published: (2024-10-01)

A strategy for cost-effective large language model use at health system-scale
by: Eyal Klang, et al.
Published: (2024-11-01)

KoBBQ: Korean Bias Benchmark for Question Answering
by: Jiho Jin, et al.
Published: (2024-05-01)

Accuracy of large language models for answering pediatric preventive dentistry questions
by: GUAN Boyan, XU Minghe, ZHANG Huiqi, MA Shulei, ZHANG Shanshan, ZHAO Junfeng
Published: (2025-04-01)

Evaluating search engines and large language models for answering health questions
by: Marcos Fernández-Pichel, et al.
Published: (2025-03-01)

Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support
by: Mahmud Omar, et al.
Published: (2025-08-01)

Large language models in medical education: a comparative cross-platform evaluation in answering histological questions
by: Volodymyr Mavrych, et al.
Published: (2025-12-01)

How valuable are the questions and answers generated by large language models in oral and maxillofacial surgery?
by: Kyuhyung Kim, et al.
Published: (2025-01-01)

Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas
by: Hayfa A. Aleid, et al.
Published: (2025-07-01)

Evaluation of Large Language Model Performance in Answering Clinical Questions on Periodontal Furcation Defect Management
by: Georgios S. Chatzopoulos, et al.
Published: (2025-06-01)

Large Language Model Synergy for Ensemble Learning in Medical Question Answering: Design and Evaluation Study
by: Han Yang, et al.
Published: (2025-07-01)

Intelligent accounting question-answering robot based on a large language model and knowledge graph
by: Shi Shengyun, et al.
Published: (2025-04-01)

Benchmarking open-source large language models on Portuguese Revalida multiple-choice questions
by: João Victor Bruneti Severino, et al.
Published: (2025-02-01)

Passage Retrieval in question answering systems in Polish language
by: Anna Pacanowska
Published: (2023-09-01)

Questions (No Answers)
Published: (2025-06-01)

Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results
by: Matt Crane
Published: (2021-03-01)

Accuracy of latest large language models in answering multiple choice questions in dentistry: A comparative study.
by: Huy Cong Nguyen, et al.
Published: (2025-01-01)

A clinician-based comparative study of large language models in answering medical questions: the case of asthma
by: Yong Yin, et al.
Published: (2025-04-01)

Large-scale Semantic Parsing without Question-Answer Pairs
by: Siva Reddy, et al.
Published: (2021-03-01)

Open-Domain Question Answering from Large Text Collections
by: Marius Paşca
Published: (2021-03-01)

MOODLE IN LANGUAGE TEACHING AND TESTING. THE EMBEDDED ANSWERS QUESTION TYPE
by: Ioana-Claudia Horea
Published: (2025-03-01)

A question-answering framework for geospatial data retrieval enhanced by a knowledge graph and large language models
by: Hao Li, et al.
Published: (2025-08-01)

Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
by: Yen Sia Low, et al.
Published: (2025-06-01)

Intelligent question answering for water conservancy project inspection driven by knowledge graph and large language model collaboration
by: Yangrui Yang, et al.
Published: (2024-12-01)

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis
by: Ling Wang, et al.
Published: (2025-04-01)

Multimodal representative answer extraction in community question answering
by: Ming Li, et al.
Published: (2023-10-01)

A topic clustering approach to finding similar questions from large question and answer archives.
by: Wei-Nan Zhang, et al.
Published: (2014-01-01)

Evaluating large language models as graders of medical short answer questions: a comparative analysis with expert human graders
by: Olena Bolgova, et al.
Published: (2025-12-01)

Osteosarcoma knowledge graph question answering system: deep learning-based knowledge graph and large language model fusion
by: Lulu Zhang, et al.
Published: (2025-05-01)

Some Answers, Some Questions
by: Nick R Anthonisen
Published: (2002-01-01)

HOW TO ANSWER CHILDREN QUESTIONS
by: O. Brenifier
Published: (2016-03-01)

The role of answer content and length when preparing answers to questions
by: Ruth Elizabeth Corps, et al.
Published: (2024-07-01)

The history of development of the cultural-historical theory and its contemporary perceptions: answering questions and questioning answers
by: Nikolai N. Veresov
Published: (2024-12-01)

Benchmarking Large Language Models for News Summarization
by: Tianyi Zhang, et al.
Published: (2024-02-01)