SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance
Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10836737/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832583268565778432 |
---|---|
author | Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai |
author_facet | Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai |
author_sort | Tipu Sultan |
collection | DOAJ |
description | Radiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use. |
format | Article |
id | doaj-art-f4fc08788fd14d71b2a0bdb36147eab1 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-f4fc08788fd14d71b2a0bdb36147eab12025-01-29T00:01:20ZengIEEEIEEE Access2169-35362025-01-0113159291594510.1109/ACCESS.2025.352833510836737SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical PerformanceTipu Sultan0https://orcid.org/0009-0002-8607-0386Mohammad Abu Tareq Rony1https://orcid.org/0000-0002-0640-1425Mohammad Shariful Islam2https://orcid.org/0009-0007-8442-1425Samah Alshathri3https://orcid.org/0000-0002-8805-7890Walid El-Shafai4https://orcid.org/0000-0001-7509-2120Department of Aerospace and Mechanical Engineering, Saint Louis University, St. Louis, MO, USADepartment of Statistics, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Computer Science and Telecommunication Engineering, Noakhali Science and Technology University, Noakhali, BangladeshDepartment of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, Saudi ArabiaComputer Science Department, Automated Systems and Soft Computing Laboratory (ASSCL), Prince Sultan University, Riyadh, Saudi ArabiaRadiology report summarization plays a critical role in medical imaging, addressing the growing need for concise and accessible interpretation of complex radiology findings. However, existing models often fail to fully leverage the potential of multimodal data integration. In this study, we propose a novel model, SumGPT, which integrates T5 with a Vision Transformer to harness the power of transformer-based architectures for enhanced radiology report summarization. The dataset used in this study comprises 1,952 radiology images with detailed textual reports for training and 488 images with reports for testing. The SumGPT technique was evaluated against several baseline models, including BERT + EfficientNet, XLM-RoBERTa + ViT, T5+ CLIP, VisualGPT (GPT-2+ ViT), and others, using a dataset explicitly designed for this task. The experimental results indicate that SumGPT outperformed all baseline models, achieving the highest performance across all metrics. Specifically, it attained a ROUGE-1 score of 0.8514, ROUGE-2 of 0.8471, ROUGE-L of 0.8514, and a BLEU score of 0.8470. The results demonstrate that SumGPT effectively produces clear and accurate summaries of radiology reports. Combining a Vision Transformer(ViT) with a language model enhances its ability to capture detailed information. The study also shows that SumGPT performs well with different types of reports and could be beneficial in other areas, such as pathology and cardiology. In the future, this approach could pave the way for applications in other medical domains while further optimizing the model for real-time clinical use.https://ieeexplore.ieee.org/document/10836737/Radiologymultimodalreport summarizationlarge language modelsVisualGPT |
spellingShingle | Tipu Sultan Mohammad Abu Tareq Rony Mohammad Shariful Islam Samah Alshathri Walid El-Shafai SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance IEEE Access Radiology multimodal report summarization large language models VisualGPT |
title | SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance |
title_full | SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance |
title_fullStr | SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance |
title_full_unstemmed | SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance |
title_short | SumGPT: A Multimodal Framework for Radiology Report Summarization to Improve Clinical Performance |
title_sort | sumgpt a multimodal framework for radiology report summarization to improve clinical performance |
topic | Radiology multimodal report summarization large language models VisualGPT |
url | https://ieeexplore.ieee.org/document/10836737/ |
work_keys_str_mv | AT tipusultan sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT mohammadabutareqrony sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT mohammadsharifulislam sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT samahalshathri sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance AT walidelshafai sumgptamultimodalframeworkforradiologyreportsummarizationtoimproveclinicalperformance |