Large Language Models lack essential metacognition for reliable medical reasoning

Abstract Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed...

Full description

Saved in:
Bibliographic Details
Main Authors: Maxime Griot, Coralie Hemptinne, Jean Vanderdonckt, Demet Yuksel
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Nature Communications
Online Access:https://doi.org/10.1038/s41467-024-55628-6
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832594534802915328
author Maxime Griot
Coralie Hemptinne
Jean Vanderdonckt
Demet Yuksel
author_facet Maxime Griot
Coralie Hemptinne
Jean Vanderdonckt
Demet Yuksel
author_sort Maxime Griot
collection DOAJ
description Abstract Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
format Article
id doaj-art-24427f5f64944ccc82b704c1739db0bd
institution Kabale University
issn 2041-1723
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Nature Communications
spelling doaj-art-24427f5f64944ccc82b704c1739db0bd2025-01-19T12:30:19ZengNature PortfolioNature Communications2041-17232025-01-0116111010.1038/s41467-024-55628-6Large Language Models lack essential metacognition for reliable medical reasoningMaxime Griot0Coralie Hemptinne1Jean Vanderdonckt2Demet Yuksel3Institute of NeuroScience, Université catholique de LouvainInstitute of NeuroScience, Université catholique de LouvainLouvain Research Institute in Management and Organizations, Université catholique de LouvainInstitute of NeuroScience, Université catholique de LouvainAbstract Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.https://doi.org/10.1038/s41467-024-55628-6
spellingShingle Maxime Griot
Coralie Hemptinne
Jean Vanderdonckt
Demet Yuksel
Large Language Models lack essential metacognition for reliable medical reasoning
Nature Communications
title Large Language Models lack essential metacognition for reliable medical reasoning
title_full Large Language Models lack essential metacognition for reliable medical reasoning
title_fullStr Large Language Models lack essential metacognition for reliable medical reasoning
title_full_unstemmed Large Language Models lack essential metacognition for reliable medical reasoning
title_short Large Language Models lack essential metacognition for reliable medical reasoning
title_sort large language models lack essential metacognition for reliable medical reasoning
url https://doi.org/10.1038/s41467-024-55628-6
work_keys_str_mv AT maximegriot largelanguagemodelslackessentialmetacognitionforreliablemedicalreasoning
AT coraliehemptinne largelanguagemodelslackessentialmetacognitionforreliablemedicalreasoning
AT jeanvanderdonckt largelanguagemodelslackessentialmetacognitionforreliablemedicalreasoning
AT demetyuksel largelanguagemodelslackessentialmetacognitionforreliablemedicalreasoning