Clinical entity augmented retrieval for clinical information extraction

Abstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves i...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ivan Lopez, Akshay Swaminathan, Karthik Vedula, Sanjana Narayanan, Fateme Nateghi Haredasht, Stephen P. Ma, April S. Liang, Steven Tate, Manoj Maddali, Robert Joseph Gallo, Nigam H. Shah, Jonathan H. Chen
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	npj Digital Medicine
Online Access:	https://doi.org/10.1038/s41746-024-01377-1
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832594441311879168
author	Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen
author_facet	Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen
author_sort	Ivan Lopez
collection	DOAJ
description	Abstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.
format	Article
id	doaj-art-cdd275f794e24fb496f700d16a907fbc
institution	Kabale University
issn	2398-6352
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	npj Digital Medicine
spelling	doaj-art-cdd275f794e24fb496f700d16a907fbc2025-01-19T12:39:42ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111110.1038/s41746-024-01377-1Clinical entity augmented retrieval for clinical information extractionIvan Lopez0Akshay Swaminathan1Karthik Vedula2Sanjana Narayanan3Fateme Nateghi Haredasht4Stephen P. Ma5April S. Liang6Steven Tate7Manoj Maddali8Robert Joseph Gallo9Nigam H. Shah10Jonathan H. Chen11Stanford University School of MedicineStanford University School of MedicinePoolesville High SchoolStanford Center for Biomedical Informatics ResearchStanford Center for Biomedical Informatics ResearchDivision of Hospital Medicine, Stanford University School of MedicineDivision of Clinical Informatics, Stanford University School of MedicineDepartment of Psychiatry and Behavioral Sciences, Stanford University School of MedicineDepartment of Biomedical Data ScienceCenter for Innovation to Implementation, VA Palo Alto Healthcare SystemStanford Center for Biomedical Informatics ResearchDepartment of Biomedical Data ScienceAbstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.https://doi.org/10.1038/s41746-024-01377-1
spellingShingle	Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen Clinical entity augmented retrieval for clinical information extraction npj Digital Medicine
title	Clinical entity augmented retrieval for clinical information extraction
title_full	Clinical entity augmented retrieval for clinical information extraction
title_fullStr	Clinical entity augmented retrieval for clinical information extraction
title_full_unstemmed	Clinical entity augmented retrieval for clinical information extraction
title_short	Clinical entity augmented retrieval for clinical information extraction
title_sort	clinical entity augmented retrieval for clinical information extraction
url	https://doi.org/10.1038/s41746-024-01377-1
work_keys_str_mv	AT ivanlopez clinicalentityaugmentedretrievalforclinicalinformationextraction AT akshayswaminathan clinicalentityaugmentedretrievalforclinicalinformationextraction AT karthikvedula clinicalentityaugmentedretrievalforclinicalinformationextraction AT sanjananarayanan clinicalentityaugmentedretrievalforclinicalinformationextraction AT fatemenateghiharedasht clinicalentityaugmentedretrievalforclinicalinformationextraction AT stephenpma clinicalentityaugmentedretrievalforclinicalinformationextraction AT aprilsliang clinicalentityaugmentedretrievalforclinicalinformationextraction AT steventate clinicalentityaugmentedretrievalforclinicalinformationextraction AT manojmaddali clinicalentityaugmentedretrievalforclinicalinformationextraction AT robertjosephgallo clinicalentityaugmentedretrievalforclinicalinformationextraction AT nigamhshah clinicalentityaugmentedretrievalforclinicalinformationextraction AT jonathanhchen clinicalentityaugmentedretrievalforclinicalinformationextraction

Clinical entity augmented retrieval for clinical information extraction

Similar Items