Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Abstract BackgroundThe capabilities of large language models (LLMs) to self-assess their own confidence in answering questions within the biomedical realm remain underexplored. ObjectiveThis study evaluates the confidence levels of 12 LLMs across 5 medical specialt...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mahmud Omar, Reem Agbareia, Benjamin S Glicksberg, Girish N Nadkarni, Eyal Klang
Format:	Article
Language:	English
Published:	JMIR Publications 2025-05-01
Series:	JMIR Medical Informatics
Online Access:	https://medinform.jmir.org/2025/1/e66917
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

Benchmarking the Confidence of Large Language Models in Answering Clinical Questions: Cross-Sectional Evaluation Study

Similar Items