Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co. Ltd.
2025-01-01
|
Series: | AI Open |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666651025000038 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832540406052552704 |
---|---|
author | Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni |
author_facet | Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni |
author_sort | Md Shofiqul Islam |
collection | DOAJ |
description | This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis. |
format | Article |
id | doaj-art-8f0ddc65619b4fedaf7263efafd9359d |
institution | Kabale University |
issn | 2666-6510 |
language | English |
publishDate | 2025-01-01 |
publisher | KeAi Communications Co. Ltd. |
record_format | Article |
series | AI Open |
spelling | doaj-art-8f0ddc65619b4fedaf7263efafd9359d2025-02-05T04:32:43ZengKeAi Communications Co. Ltd.AI Open2666-65102025-01-0161244Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detectionMd Shofiqul Islam0Khondokar Fida Hasan1Hasibul Hossain Shajeeb2Humayan Kabir Rana3Md. Saifur Rahman4Md. Munirul Hasan5AKM Azad6Ibrahim Abdullah7Mohammad Ali Moni8Institute for Intelligent Systems Research and Innovation (ISSRI), Deakin University, 75 Pigdons Rd, Warun Ponds, Geelong, 3216, Victoria, Australia; Universiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaSchool of Professional Studies, University of New South Wales, UNSW, Canberra, 2601, ACT, Australia; Corresponding author.Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshDepartment of Computer Science and Engineering, Green University, Narayanganj, Dhaka, 1461, BangladeshDepartment of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshUniversiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaDepartment of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi ArabiaIslamic University, Kushtia, 7600, BangladeshArtificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia; AI & Digital Health Technology, RURAL Health Research Institute, Charles Sturt University, Orange, 2800, New South Wales, Australia; Corresponding author at: Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia.This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.http://www.sciencedirect.com/science/article/pii/S2666651025000038Deep learningImage processingTextSpeechMedical diagnosis |
spellingShingle | Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection AI Open Deep learning Image processing Text Speech Medical diagnosis |
title | Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection |
title_full | Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection |
title_fullStr | Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection |
title_full_unstemmed | Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection |
title_short | Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection |
title_sort | multimodal marvels of deep learning in medical diagnosis using image speech and text a comprehensive review of covid 19 detection |
topic | Deep learning Image processing Text Speech Medical diagnosis |
url | http://www.sciencedirect.com/science/article/pii/S2666651025000038 |
work_keys_str_mv | AT mdshofiqulislam multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT khondokarfidahasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT hasibulhossainshajeeb multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT humayankabirrana multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mdsaifurrahman multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mdmunirulhasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT akmazad multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT ibrahimabdullah multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mohammadalimoni multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection |