Cross sectional pilot study on clinical review generation using large language models

Abstract As the volume of medical literature accelerates, necessitating efficient tools to synthesize evidence for clinical practice and research, the interest in leveraging large language models (LLMs) for generating clinical reviews has surged. However, there are significant concerns regarding the...

Full description

Saved in:
Bibliographic Details
Main Authors: Zining Luo, Yang Qiao, Xinyu Xu, Xiangyu Li, Mengyan Xiao, Aijia Kang, Dunrui Wang, Yueshan Pang, Xing Xie, Sijun Xie, Dachen Luo, Xuefeng Ding, Zhenglong Liu, Ying Liu, Aimin Hu, Yixing Ren, Jiebin Xie
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01535-z
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract As the volume of medical literature accelerates, necessitating efficient tools to synthesize evidence for clinical practice and research, the interest in leveraging large language models (LLMs) for generating clinical reviews has surged. However, there are significant concerns regarding the reliability associated with integrating LLMs into the clinical review process. This study presents a systematic comparison between LLM-generated and human-authored clinical reviews, revealing that while AI can quickly produce reviews, it often has fewer references, less comprehensive insights, and lower logical consistency while exhibiting lower authenticity and accuracy in their citations. Additionally, a higher proportion of its references are from lower-tier journals. Moreover, the study uncovers a concerning inefficiency in current detection systems for identifying AI-generated content, suggesting a need for more advanced checking systems and a stronger ethical framework to ensure academic transparency. Addressing these challenges is vital for the responsible integration of LLMs into clinical research.
ISSN:2398-6352