ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays
Radiology departments are under increasing pressure to meet the demand for timely and accurate diagnostics, especially with chest x-rays, a key modality for pulmonary condition assessment. Producing comprehensive and accurate radiological reports is a time-consuming process prone to errors, particul...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Digital Health |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdgth.2025.1535168/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592412734652416 |
---|---|
author | Prateek Singh Sudhakar Singh |
author_facet | Prateek Singh Sudhakar Singh |
author_sort | Prateek Singh |
collection | DOAJ |
description | Radiology departments are under increasing pressure to meet the demand for timely and accurate diagnostics, especially with chest x-rays, a key modality for pulmonary condition assessment. Producing comprehensive and accurate radiological reports is a time-consuming process prone to errors, particularly in high-volume clinical environments. Automated report generation plays a crucial role in alleviating radiologists' workload, improving diagnostic accuracy, and ensuring consistency. This paper introduces ChestX-Transcribe, a multimodal transformer model that combines the Swin Transformer for extracting high-resolution visual features with DistilGPT for generating clinically relevant, semantically rich medical reports. Trained on the Indiana University Chest x-ray dataset, ChestX-Transcribe demonstrates state-of-the-art performance across BLEU, ROUGE, and METEOR metrics, outperforming prior models in producing clinically meaningful reports. However, the reliance on the Indiana University dataset introduces potential limitations, including selection bias, as the dataset is collected from specific hospitals within the Indiana Network for Patient Care. This may result in underrepresentation of certain demographics or conditions not prevalent in those healthcare settings, potentially skewing model predictions when applied to more diverse populations or different clinical environments. Additionally, the ethical implications of handling sensitive medical data, including patient privacy and data security, are considered. Despite these challenges, ChestX-Transcribe shows promising potential for enhancing real-world radiology workflows by automating the creation of medical reports, reducing diagnostic errors, and improving efficiency. The findings highlight the transformative potential of multimodal transformers in healthcare, with future work focusing on improving model generalizability and optimizing clinical integration. |
format | Article |
id | doaj-art-5eef14aa6c6046f7afb78853757cc7a2 |
institution | Kabale University |
issn | 2673-253X |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Digital Health |
spelling | doaj-art-5eef14aa6c6046f7afb78853757cc7a22025-01-21T08:36:48ZengFrontiers Media S.A.Frontiers in Digital Health2673-253X2025-01-01710.3389/fdgth.2025.15351681535168ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-raysPrateek SinghSudhakar SinghRadiology departments are under increasing pressure to meet the demand for timely and accurate diagnostics, especially with chest x-rays, a key modality for pulmonary condition assessment. Producing comprehensive and accurate radiological reports is a time-consuming process prone to errors, particularly in high-volume clinical environments. Automated report generation plays a crucial role in alleviating radiologists' workload, improving diagnostic accuracy, and ensuring consistency. This paper introduces ChestX-Transcribe, a multimodal transformer model that combines the Swin Transformer for extracting high-resolution visual features with DistilGPT for generating clinically relevant, semantically rich medical reports. Trained on the Indiana University Chest x-ray dataset, ChestX-Transcribe demonstrates state-of-the-art performance across BLEU, ROUGE, and METEOR metrics, outperforming prior models in producing clinically meaningful reports. However, the reliance on the Indiana University dataset introduces potential limitations, including selection bias, as the dataset is collected from specific hospitals within the Indiana Network for Patient Care. This may result in underrepresentation of certain demographics or conditions not prevalent in those healthcare settings, potentially skewing model predictions when applied to more diverse populations or different clinical environments. Additionally, the ethical implications of handling sensitive medical data, including patient privacy and data security, are considered. Despite these challenges, ChestX-Transcribe shows promising potential for enhancing real-world radiology workflows by automating the creation of medical reports, reducing diagnostic errors, and improving efficiency. The findings highlight the transformative potential of multimodal transformers in healthcare, with future work focusing on improving model generalizability and optimizing clinical integration.https://www.frontiersin.org/articles/10.3389/fdgth.2025.1535168/fullmedical report generationmultimodal transformersswin transformerDistilGPTvision-language modelsradiology workflow |
spellingShingle | Prateek Singh Sudhakar Singh ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays Frontiers in Digital Health medical report generation multimodal transformers swin transformer DistilGPT vision-language models radiology workflow |
title | ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays |
title_full | ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays |
title_fullStr | ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays |
title_full_unstemmed | ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays |
title_short | ChestX-Transcribe: a multimodal transformer for automated radiology report generation from chest x-rays |
title_sort | chestx transcribe a multimodal transformer for automated radiology report generation from chest x rays |
topic | medical report generation multimodal transformers swin transformer DistilGPT vision-language models radiology workflow |
url | https://www.frontiersin.org/articles/10.3389/fdgth.2025.1535168/full |
work_keys_str_mv | AT prateeksingh chestxtranscribeamultimodaltransformerforautomatedradiologyreportgenerationfromchestxrays AT sudhakarsingh chestxtranscribeamultimodaltransformerforautomatedradiologyreportgenerationfromchestxrays |