SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance

Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tipu Sultan, Mohammad Abu Tareq Rony, Mohammad Shariful Islam, Samah Alshathri, Walid El-Shafai
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Radiology multimodal report summarization large language models VisualGPT
Online Access:	https://ieeexplore.ieee.org/document/10836737/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832583268565778432
author	Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai
author_facet	Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai
author_sort	Tipu Sultan
collection	DOAJ
description	Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use.
format	Article
id	doaj-art-f4fc08788fd14d71b2a0bdb36147eab1
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-f4fc08788fd14d71b2a0bdb36147eab12025-01-29T00:01:20ZengIEEEIEEE Access2169-35362025-01-0113159291594510.1109/ACCESS.2025.352833510836737SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical PerformanceTipu Sultan0https://orcid.org/0009-0002-8607-0386Mohammad Abu Tareq Rony1https://orcid.org/0000-0002-0640-1425Mohammad Shariful Islam2https://orcid.org/0009-0007-8442-1425Samah Alshathri3https://orcid.org/0000-0002-8805-7890Walid El-Shafai4https://orcid.org/0000-0001-7509-2120Department of Aerospace and Mechanical Engineering, Saint Louis University, St. Louis, MO, USADepartment of Statistics, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi ArabiaComputer Science Department, Automated Systems and Soft Computing Laboratory (ASSCL), Prince Sultan University, Riyadh, Saudi ArabiaRadiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use.https://ieeexplore.ieee.org/document/10836737/Radiologymultimodalreport summarizationlarge language modelsVisualGPT
spellingShingle	Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance IEEE Access Radiology multimodal report summarization large language models VisualGPT
title	SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_full	SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_fullStr	SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_full_unstemmed	SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_short	SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_sort	sumgpt a multimodal framework for radiology report summarization to improve clinical performance
topic	Radiology multimodal report summarization large language models VisualGPT
url	https://ieeexplore.ieee.org/document/10836737/
work_keys_str_mv	AT tipusultan sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT mohammadabutareqrony sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT mohammadsharifulislam sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT samahalshathri sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT walidelshafai sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance

SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance

Similar Items