SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance

Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose...

Full description

Saved in:
Bibliographic Details
Main Authors: Tipu Sultan, Mohammad Abu Tareq Rony, Mohammad Shariful Islam, Samah Alshathri, Walid El-Shafai
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10836737/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832583268565778432
author Tipu Sultan
Mohammad Abu Tareq Rony
Mohammad Shariful Islam
Samah Alshathri
Walid El-Shafai
author_facet Tipu Sultan
Mohammad Abu Tareq Rony
Mohammad Shariful Islam
Samah Alshathri
Walid El-Shafai
author_sort Tipu Sultan
collection DOAJ
description Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use.
format Article
id doaj-art-f4fc08788fd14d71b2a0bdb36147eab1
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-f4fc08788fd14d71b2a0bdb36147eab12025-01-29T00:01:20ZengIEEEIEEE Access2169-35362025-01-0113159291594510.1109/ACCESS.2025.352833510836737SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical PerformanceTipu Sultan0https://orcid.org/0009-0002-8607-0386Mohammad Abu Tareq Rony1https://orcid.org/0000-0002-0640-1425Mohammad Shariful Islam2https://orcid.org/0009-0007-8442-1425Samah Alshathri3https://orcid.org/0000-0002-8805-7890Walid El-Shafai4https://orcid.org/0000-0001-7509-2120Department of Aerospace and Mechanical Engineering, Saint Louis University, St. Louis, MO, USADepartment of Statistics, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi ArabiaComputer Science Department, Automated Systems and Soft Computing Laboratory (ASSCL), Prince Sultan University, Riyadh, Saudi ArabiaRadiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use.https://ieeexplore.ieee.org/document/10836737/Radiologymultimodalreport summarizationlarge language modelsVisualGPT
spellingShingle Tipu Sultan
Mohammad Abu Tareq Rony
Mohammad Shariful Islam
Samah Alshathri
Walid El-Shafai
SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
IEEE Access
Radiology
multimodal
report summarization
large language models
VisualGPT
title SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_full SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_fullStr SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_full_unstemmed SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_short SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
title_sort sumgpt a multimodal framework for radiology report summarization to improve clinical performance
topic Radiology
multimodal
report summarization
large language models
VisualGPT
url https://ieeexplore.ieee.org/document/10836737/
work_keys_str_mv AT tipusultan sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance
AT mohammadabutareqrony sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance
AT mohammadsharifulislam sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance
AT samahalshathri sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance
AT walidelshafai sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance