Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection

This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in...

Full description

Saved in:
Bibliographic Details
Main Authors: Md Shofiqul Islam, Khondokar Fida Hasan, Hasibul Hossain Shajeeb, Humayan Kabir Rana, Md. Saifur Rahman, Md. Munirul Hasan, AKM Azad, Ibrahim Abdullah, Mohammad Ali Moni
Format: Article
Language:English
Published: KeAi Communications Co. Ltd. 2025-01-01
Series:AI Open
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666651025000038
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832540406052552704
author Md Shofiqul Islam
Khondokar Fida Hasan
Hasibul Hossain Shajeeb
Humayan Kabir Rana
Md. Saifur Rahman
Md. Munirul Hasan
AKM Azad
Ibrahim Abdullah
Mohammad Ali Moni
author_facet Md Shofiqul Islam
Khondokar Fida Hasan
Hasibul Hossain Shajeeb
Humayan Kabir Rana
Md. Saifur Rahman
Md. Munirul Hasan
AKM Azad
Ibrahim Abdullah
Mohammad Ali Moni
author_sort Md Shofiqul Islam
collection DOAJ
description This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
format Article
id doaj-art-8f0ddc65619b4fedaf7263efafd9359d
institution Kabale University
issn 2666-6510
language English
publishDate 2025-01-01
publisher KeAi Communications Co. Ltd.
record_format Article
series AI Open
spelling doaj-art-8f0ddc65619b4fedaf7263efafd9359d2025-02-05T04:32:43ZengKeAi Communications Co. Ltd.AI Open2666-65102025-01-0161244Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detectionMd Shofiqul Islam0Khondokar Fida Hasan1Hasibul Hossain Shajeeb2Humayan Kabir Rana3Md. Saifur Rahman4Md. Munirul Hasan5AKM Azad6Ibrahim Abdullah7Mohammad Ali Moni8Institute for Intelligent Systems Research and Innovation (ISSRI), Deakin University, 75 Pigdons Rd, Warun Ponds, Geelong, 3216, Victoria, Australia; Universiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaSchool of Professional Studies, University of New South Wales, UNSW, Canberra, 2601, ACT, Australia; Corresponding author.Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshDepartment of Computer Science and Engineering, Green University, Narayanganj, Dhaka, 1461, BangladeshDepartment of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshUniversiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaDepartment of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi ArabiaIslamic University, Kushtia, 7600, BangladeshArtificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia; AI & Digital Health Technology, RURAL Health Research Institute, Charles Sturt University, Orange, 2800, New South Wales, Australia; Corresponding author at: Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia.This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.http://www.sciencedirect.com/science/article/pii/S2666651025000038Deep learningImage processingTextSpeechMedical diagnosis
spellingShingle Md Shofiqul Islam
Khondokar Fida Hasan
Hasibul Hossain Shajeeb
Humayan Kabir Rana
Md. Saifur Rahman
Md. Munirul Hasan
AKM Azad
Ibrahim Abdullah
Mohammad Ali Moni
Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
AI Open
Deep learning
Image processing
Text
Speech
Medical diagnosis
title Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_full Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_fullStr Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_full_unstemmed Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_short Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_sort multimodal marvels of deep learning in medical diagnosis using image speech and text a comprehensive review of covid 19 detection
topic Deep learning
Image processing
Text
Speech
Medical diagnosis
url http://www.sciencedirect.com/science/article/pii/S2666651025000038
work_keys_str_mv AT mdshofiqulislam multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT khondokarfidahasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT hasibulhossainshajeeb multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT humayankabirrana multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT mdsaifurrahman multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT mdmunirulhasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT akmazad multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT ibrahimabdullah multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection
AT mohammadalimoni multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection