Clinical entity augmented retrieval for clinical information extraction
Abstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves i...
Saved in:
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-024-01377-1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832594441311879168 |
---|---|
author | Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen |
author_facet | Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen |
author_sort | Ivan Lopez |
collection | DOAJ |
description | Abstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods. |
format | Article |
id | doaj-art-cdd275f794e24fb496f700d16a907fbc |
institution | Kabale University |
issn | 2398-6352 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj-art-cdd275f794e24fb496f700d16a907fbc2025-01-19T12:39:42ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111110.1038/s41746-024-01377-1Clinical entity augmented retrieval for clinical information extractionIvan Lopez0Akshay Swaminathan1Karthik Vedula2Sanjana Narayanan3Fateme Nateghi Haredasht4Stephen P. Ma5April S. Liang6Steven Tate7Manoj Maddali8Robert Joseph Gallo9Nigam H. Shah10Jonathan H. Chen11Stanford University School of MedicineStanford University School of MedicinePoolesville High SchoolStanford Center for Biomedical Informatics ResearchStanford Center for Biomedical Informatics ResearchDivision of Hospital Medicine, Stanford University School of MedicineDivision of Clinical Informatics, Stanford University School of MedicineDepartment of Psychiatry and Behavioral Sciences, Stanford University School of MedicineDepartment of Biomedical Data ScienceCenter for Innovation to Implementation, VA Palo Alto Healthcare SystemStanford Center for Biomedical Informatics ResearchDepartment of Biomedical Data ScienceAbstract Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.https://doi.org/10.1038/s41746-024-01377-1 |
spellingShingle | Ivan Lopez Akshay Swaminathan Karthik Vedula Sanjana Narayanan Fateme Nateghi Haredasht Stephen P. Ma April S. Liang Steven Tate Manoj Maddali Robert Joseph Gallo Nigam H. Shah Jonathan H. Chen Clinical entity augmented retrieval for clinical information extraction npj Digital Medicine |
title | Clinical entity augmented retrieval for clinical information extraction |
title_full | Clinical entity augmented retrieval for clinical information extraction |
title_fullStr | Clinical entity augmented retrieval for clinical information extraction |
title_full_unstemmed | Clinical entity augmented retrieval for clinical information extraction |
title_short | Clinical entity augmented retrieval for clinical information extraction |
title_sort | clinical entity augmented retrieval for clinical information extraction |
url | https://doi.org/10.1038/s41746-024-01377-1 |
work_keys_str_mv | AT ivanlopez clinicalentityaugmentedretrievalforclinicalinformationextraction AT akshayswaminathan clinicalentityaugmentedretrievalforclinicalinformationextraction AT karthikvedula clinicalentityaugmentedretrievalforclinicalinformationextraction AT sanjananarayanan clinicalentityaugmentedretrievalforclinicalinformationextraction AT fatemenateghiharedasht clinicalentityaugmentedretrievalforclinicalinformationextraction AT stephenpma clinicalentityaugmentedretrievalforclinicalinformationextraction AT aprilsliang clinicalentityaugmentedretrievalforclinicalinformationextraction AT steventate clinicalentityaugmentedretrievalforclinicalinformationextraction AT manojmaddali clinicalentityaugmentedretrievalforclinicalinformationextraction AT robertjosephgallo clinicalentityaugmentedretrievalforclinicalinformationextraction AT nigamhshah clinicalentityaugmentedretrievalforclinicalinformationextraction AT jonathanhchen clinicalentityaugmentedretrievalforclinicalinformationextraction |