The use of large language models in detecting Chinese ultrasound report errors

Abstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3....

Full description

Saved in:
Bibliographic Details
Main Authors: Yuqi Yan, Kai Wang, Bojian Feng, Jincao Yao, Tian Jiang, Zhiyan Jin, Yin Zheng, Yahan Zhou, Chen Chen, Lin Sui, Xiayi Chen, Yanhong Du, Jie Yang, Qianmeng Pan, Lingyan Zhou, Vicky Yang Wang, Ping Liang, Dong Xu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01468-7
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571349402386432
author Yuqi Yan
Kai Wang
Bojian Feng
Jincao Yao
Tian Jiang
Zhiyan Jin
Yin Zheng
Yahan Zhou
Chen Chen
Lin Sui
Xiayi Chen
Yanhong Du
Jie Yang
Qianmeng Pan
Lingyan Zhou
Vicky Yang Wang
Ping Liang
Dong Xu
author_facet Yuqi Yan
Kai Wang
Bojian Feng
Jincao Yao
Tian Jiang
Zhiyan Jin
Yin Zheng
Yahan Zhou
Chen Chen
Lin Sui
Xiayi Chen
Yanhong Du
Jie Yang
Qianmeng Pan
Lingyan Zhou
Vicky Yang Wang
Ping Liang
Dong Xu
author_sort Yuqi Yan
collection DOAJ
description Abstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3.5 Sonnet were tested in zero-shot settings, with the top two models further assessed in few-shot scenarios. Six radiologists of varying experience levels performed error detection on a randomly selected test set. In zero-shot setting, Claude 3.5 Sonnet and GPT-4o achieved the highest error detection rates (52.3% and 41.2%, respectively). In few-shot, Claude 3.5 Sonnet outperformed senior and resident radiologists, while GPT-4o excelled in spelling error detection. LLMs processed reports faster than the quickest radiologist (Claude 3.5 Sonnet: 13.2 s, GPT-4o: 15.0 s, radiologist: 42.0 s per report). This study demonstrates the potential of LLMs to enhance ultrasound report accuracy, outperforming human experts in certain aspects.
format Article
id doaj-art-fee2950dbbd345ba90e793eb8ce0cc71
institution Kabale University
issn 2398-6352
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series npj Digital Medicine
spelling doaj-art-fee2950dbbd345ba90e793eb8ce0cc712025-02-02T12:43:44ZengNature Portfolionpj Digital Medicine2398-63522025-01-018111310.1038/s41746-025-01468-7The use of large language models in detecting Chinese ultrasound report errorsYuqi Yan0Kai Wang1Bojian Feng2Jincao Yao3Tian Jiang4Zhiyan Jin5Yin Zheng6Yahan Zhou7Chen Chen8Lin Sui9Xiayi Chen10Yanhong Du11Jie Yang12Qianmeng Pan13Lingyan Zhou14Vicky Yang Wang15Ping Liang16Dong Xu17Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalCenter of Intelligent Diagnosis and Therapy (Taizhou), Hangzhou Institute of Medicine (HIM), Chinese Academy of SciencesDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Ultrasound, The Affiliated Dongyang Hospital of Wenzhou Medical UniversityDepartment of Ultrasound, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital)Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalCenter of Intelligent Diagnosis and Therapy (Taizhou), Hangzhou Institute of Medicine (HIM), Chinese Academy of SciencesDepartment of Ultrasound, Chinese PLA General Hospital, Chinese PLA Medical SchoolDepartment of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer HospitalAbstract This retrospective study evaluated the efficacy of large language models (LLMs) in improving the accuracy of Chinese ultrasound reports. Data from three hospitals (January-April 2024) including 400 reports with 243 errors across six categories were analyzed. Three GPT versions and Claude 3.5 Sonnet were tested in zero-shot settings, with the top two models further assessed in few-shot scenarios. Six radiologists of varying experience levels performed error detection on a randomly selected test set. In zero-shot setting, Claude 3.5 Sonnet and GPT-4o achieved the highest error detection rates (52.3% and 41.2%, respectively). In few-shot, Claude 3.5 Sonnet outperformed senior and resident radiologists, while GPT-4o excelled in spelling error detection. LLMs processed reports faster than the quickest radiologist (Claude 3.5 Sonnet: 13.2 s, GPT-4o: 15.0 s, radiologist: 42.0 s per report). This study demonstrates the potential of LLMs to enhance ultrasound report accuracy, outperforming human experts in certain aspects.https://doi.org/10.1038/s41746-025-01468-7
spellingShingle Yuqi Yan
Kai Wang
Bojian Feng
Jincao Yao
Tian Jiang
Zhiyan Jin
Yin Zheng
Yahan Zhou
Chen Chen
Lin Sui
Xiayi Chen
Yanhong Du
Jie Yang
Qianmeng Pan
Lingyan Zhou
Vicky Yang Wang
Ping Liang
Dong Xu
The use of large language models in detecting Chinese ultrasound report errors
npj Digital Medicine
title The use of large language models in detecting Chinese ultrasound report errors
title_full The use of large language models in detecting Chinese ultrasound report errors
title_fullStr The use of large language models in detecting Chinese ultrasound report errors
title_full_unstemmed The use of large language models in detecting Chinese ultrasound report errors
title_short The use of large language models in detecting Chinese ultrasound report errors
title_sort use of large language models in detecting chinese ultrasound report errors
url https://doi.org/10.1038/s41746-025-01468-7
work_keys_str_mv AT yuqiyan theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT kaiwang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT bojianfeng theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT jincaoyao theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT tianjiang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT zhiyanjin theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yinzheng theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yahanzhou theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT chenchen theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT linsui theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT xiayichen theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yanhongdu theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT jieyang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT qianmengpan theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT lingyanzhou theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT vickyyangwang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT pingliang theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT dongxu theuseoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yuqiyan useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT kaiwang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT bojianfeng useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT jincaoyao useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT tianjiang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT zhiyanjin useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yinzheng useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yahanzhou useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT chenchen useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT linsui useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT xiayichen useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT yanhongdu useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT jieyang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT qianmengpan useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT lingyanzhou useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT vickyyangwang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT pingliang useoflargelanguagemodelsindetectingchineseultrasoundreporterrors
AT dongxu useoflargelanguagemodelsindetectingchineseultrasoundreporterrors