The use of large language models in detecting Chinese ultrasound report errors
Abstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3....
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-025-01468-7 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571349402386432 |
---|---|
author | Yuqi Yan Kai Wang Bojian Feng Jincao Yao Tian Jiang Zhiyan Jin Yin Zheng Yahan Zhou Chen Chen Lin Sui Xiayi Chen Yanhong Du Jie Yang Qianmeng Pan Lingyan Zhou Vicky Yang Wang Ping Liang Dong Xu |
author_facet | Yuqi Yan Kai Wang Bojian Feng Jincao Yao Tian Jiang Zhiyan Jin Yin Zheng Yahan Zhou Chen Chen Lin Sui Xiayi Chen Yanhong Du Jie Yang Qianmeng Pan Lingyan Zhou Vicky Yang Wang Ping Liang Dong Xu |
author_sort | Yuqi Yan |
collection | DOAJ |
description | Abstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3.5 Sonnet were tested in zero-shot settings, with the top two models further assessed in few-shot scenarios. Six radiologists of varying experience levels performed error detection on a randomly selected test set. In zero-shot setting, Claude 3.5 Sonnet and GPT-4o achieved the highest error detection rates (52.3% and 41.2%, respectively). In few-shot, Claude 3.5 Sonnet outperformed senior and resident radiologists, while GPT-4o excelled in spelling error detection. LLMs processed reports faster than the quickest radiologist (Claude 3.5 Sonnet: 13.2 s, GPT-4o: 15.0 s, radiologist: 42.0 s per report). This study demonstrates the potential of LLMs to enhance ultrasound report accuracy, outperforming human experts in certain aspects. |
format | Article |
id | doaj-art-fee2950dbbd345ba90e793eb8ce0cc71 |
institution | Kabale University |
issn | 2398-6352 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj-art-fee2950dbbd345ba90e793eb8ce0cc712025-02-02T12:43:44ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111310.1038/s41746-025-01468-7The use of large language models in detecting Chinese ultrasound report errorsYuqi Yan0Kai Wang1Bojian Feng2Jincao Yao3Tian Jiang4Zhiyan Jin5Yin Zheng6Yahan Zhou7Chen Chen8Lin Sui9Xiayi Chen10Yanhong Du11Jie Yang12Qianmeng Pan13Lingyan Zhou14Vicky Yang Wang15Ping Liang16Dong Xu17Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalCenter of Intelligent Diagnosis and Therapy (Taizhou), Hangzhou Institute of Medicine (HIM), Chinese Academy of SciencesDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Ultrasound, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital)Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalCenter of Intelligent Diagnosis and Therapy (Taizhou), Hangzhou Institute of Medicine (HIM), Chinese Academy of SciencesDepartment of Ultrasound, Chinese PLA General Hospital, Chinese PLA Medical SchoolDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalAbstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3.5 Sonnet were tested in zero-shot settings, with the top two models further assessed in few-shot scenarios. Six radiologists of varying experience levels performed error detection on a randomly selected test set. In zero-shot setting, Claude 3.5 Sonnet and GPT-4o achieved the highest error detection rates (52.3% and 41.2%, respectively). In few-shot, Claude 3.5 Sonnet outperformed senior and resident radiologists, while GPT-4o excelled in spelling error detection. LLMs processed reports faster than the quickest radiologist (Claude 3.5 Sonnet: 13.2 s, GPT-4o: 15.0 s, radiologist: 42.0 s per report). This study demonstrates the potential of LLMs to enhance ultrasound report accuracy, outperforming human experts in certain aspects.https://doi.org/10.1038/s41746-025-01468-7 |
spellingShingle | Yuqi Yan Kai Wang Bojian Feng Jincao Yao Tian Jiang Zhiyan Jin Yin Zheng Yahan Zhou Chen Chen Lin Sui Xiayi Chen Yanhong Du Jie Yang Qianmeng Pan Lingyan Zhou Vicky Yang Wang Ping Liang Dong Xu The use of large language models in detecting Chinese ultrasound report errors npj Digital Medicine |
title | The use of large language models in detecting Chinese ultrasound report errors |
title_full | The use of large language models in detecting Chinese ultrasound report errors |
title_fullStr | The use of large language models in detecting Chinese ultrasound report errors |
title_full_unstemmed | The use of large language models in detecting Chinese ultrasound report errors |
title_short | The use of large language models in detecting Chinese ultrasound report errors |
title_sort | use of large language models in detecting chinese ultrasound report errors |
url | https://doi.org/10.1038/s41746-025-01468-7 |
work_keys_str_mv | AT yuqiyan theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT kaiwang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT bojianfeng theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT jincaoyao theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT tianjiang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT zhiyanjin theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yinzheng theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yahanzhou theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT chenchen theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT linsui theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT xiayichen theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yanhongdu theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT jieyang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT qianmengpan theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT lingyanzhou theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT vickyyangwang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT pingliang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT dongxu theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yuqiyan useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT kaiwang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT bojianfeng useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT jincaoyao useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT tianjiang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT zhiyanjin useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yinzheng useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yahanzhou useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT chenchen useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT linsui useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT xiayichen useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT yanhongdu useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT jieyang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT qianmengpan useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT lingyanzhou useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT vickyyangwang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT pingliang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors AT dongxu useoflargelanguagemodelsindetectingchineseultrasoundreporterrors |