Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection

This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in...

Full description

Saved in:

Bibliographic Details
Main Authors:	Md Shofiqul Islam, Khondokar Fida Hasan, Hasibul Hossain Shajeeb, Humayan Kabir Rana, Md. Saifur Rahman, Md. Munirul Hasan, AKM Azad, Ibrahim Abdullah, Mohammad Ali Moni
Format:	Article
Language:	English
Published:	KeAi Communications Co. Ltd. 2025-01-01
Series:	AI Open
Subjects:	Deep learning Image processing Text Speech Medical diagnosis
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666651025000038
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540406052552704
author	Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni
author_facet	Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni
author_sort	Md Shofiqul Islam
collection	DOAJ
description	This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
format	Article
id	doaj-art-8f0ddc65619b4fedaf7263efafd9359d
institution	Kabale University
issn	2666-6510
language	English
publishDate	2025-01-01
publisher	KeAi Communications Co. Ltd.
record_format	Article
series	AI Open
spelling	doaj-art-8f0ddc65619b4fedaf7263efafd9359d2025-02-05T04:32:43ZengKeAi Communications Co. Ltd.AI Open2666-65102025-01-0161244Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detectionMd Shofiqul Islam0Khondokar Fida Hasan1Hasibul Hossain Shajeeb2Humayan Kabir Rana3Md. Saifur Rahman4Md. Munirul Hasan5AKM Azad6Ibrahim Abdullah7Mohammad Ali Moni8Institute for Intelligent Systems Research and Innovation (ISSRI), Deakin University, 75 Pigdons Rd, Warun Ponds, Geelong, 3216, Victoria, Australia; Universiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaSchool of Professional Studies, University of New South Wales, UNSW, Canberra, 2601, ACT, Australia; Corresponding author.Department of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshDepartment of Computer Science and Engineering, Green University, Narayanganj, Dhaka, 1461, BangladeshDepartment of Computer Science and Engineering, Bangladesh University of Business and Technology, Mirpur-2, Dhaka, BangladeshUniversiti Malaysia Pahang Al-Sultan Abdullah (UMPSA), Kuantan, Pahang, 26600 Pekan, MalaysiaDepartment of Mathematics and Statistics, Faculty of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), 13318, Riyadh, Saudi ArabiaIslamic University, Kushtia, 7600, BangladeshArtificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia; AI & Digital Health Technology, RURAL Health Research Institute, Charles Sturt University, Orange, 2800, New South Wales, Australia; Corresponding author at: Artificial Intelligence and Cyber Futures Institute, Charles Sturt University, Panorama Ave, Bathurst, 2795, New South Wales, Australia.This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.http://www.sciencedirect.com/science/article/pii/S2666651025000038Deep learningImage processingTextSpeechMedical diagnosis
spellingShingle	Md Shofiqul Islam Khondokar Fida Hasan Hasibul Hossain Shajeeb Humayan Kabir Rana Md. Saifur Rahman Md. Munirul Hasan AKM Azad Ibrahim Abdullah Mohammad Ali Moni Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection AI Open Deep learning Image processing Text Speech Medical diagnosis
title	Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_full	Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_fullStr	Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_full_unstemmed	Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_short	Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection
title_sort	multimodal marvels of deep learning in medical diagnosis using image speech and text a comprehensive review of covid 19 detection
topic	Deep learning Image processing Text Speech Medical diagnosis
url	http://www.sciencedirect.com/science/article/pii/S2666651025000038
work_keys_str_mv	AT mdshofiqulislam multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT khondokarfidahasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT hasibulhossainshajeeb multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT humayankabirrana multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mdsaifurrahman multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mdmunirulhasan multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT akmazad multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT ibrahimabdullah multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection AT mohammadalimoni multimodalmarvelsofdeeplearninginmedicaldiagnosisusingimagespeechandtextacomprehensivereviewofcovid19detection

Multimodal marvels of deep learning in medical diagnosis using image, speech, and text: A comprehensive review of COVID-19 detection

Similar Items