The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study

Federated learning (FL) has emerged as a transformative framework for collaborative learning, offering robust model training across institutions while ensuring data privacy. In the context of making a COVID-19 diagnosis using lung imaging, FL enables institutions to collaboratively train a global mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatimah Alhafiz, Abdullah Basuhail
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/1/11
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589157849890816
author Fatimah Alhafiz
Abdullah Basuhail
author_facet Fatimah Alhafiz
Abdullah Basuhail
author_sort Fatimah Alhafiz
collection DOAJ
description Federated learning (FL) has emerged as a transformative framework for collaborative learning, offering robust model training across institutions while ensuring data privacy. In the context of making a COVID-19 diagnosis using lung imaging, FL enables institutions to collaboratively train a global model without sharing sensitive patient data. A central manager aggregates local model updates to compute global updates, ensuring secure and effective integration. The global model’s generalization capability is evaluated using centralized testing data before dissemination to participating nodes, where local assessments facilitate personalized adaptations tailored to diverse datasets. Addressing data heterogeneity, a critical challenge in medical imaging, is essential for improving both global performance and local personalization in FL systems. This study emphasizes the importance of recognizing real-world data variability before proposing solutions to tackle non-independent and non-identically distributed (non-IID) data. We investigate the impact of data heterogeneity on FL performance in COVID-19 lung imaging across seven distinct heterogeneity settings. By comprehensively evaluating models using generalization and personalization metrics, we highlight challenges and opportunities for optimizing FL frameworks. The findings provide valuable insights that can guide future research toward achieving a balance between global generalization and local adaptation, ultimately enhancing diagnostic accuracy and patient outcomes in COVID-19 lung imaging.
format Article
id doaj-art-fb5cca5e39414d6c853ec58d2a01ff8d
institution Kabale University
issn 2504-2289
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj-art-fb5cca5e39414d6c853ec58d2a01ff8d2025-01-24T13:22:32ZengMDPI AGBig Data and Cognitive Computing2504-22892025-01-01911110.3390/bdcc9010011The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental StudyFatimah Alhafiz0Abdullah Basuhail1Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaFaculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaFederated learning (FL) has emerged as a transformative framework for collaborative learning, offering robust model training across institutions while ensuring data privacy. In the context of making a COVID-19 diagnosis using lung imaging, FL enables institutions to collaboratively train a global model without sharing sensitive patient data. A central manager aggregates local model updates to compute global updates, ensuring secure and effective integration. The global model’s generalization capability is evaluated using centralized testing data before dissemination to participating nodes, where local assessments facilitate personalized adaptations tailored to diverse datasets. Addressing data heterogeneity, a critical challenge in medical imaging, is essential for improving both global performance and local personalization in FL systems. This study emphasizes the importance of recognizing real-world data variability before proposing solutions to tackle non-independent and non-identically distributed (non-IID) data. We investigate the impact of data heterogeneity on FL performance in COVID-19 lung imaging across seven distinct heterogeneity settings. By comprehensively evaluating models using generalization and personalization metrics, we highlight challenges and opportunities for optimizing FL frameworks. The findings provide valuable insights that can guide future research toward achieving a balance between global generalization and local adaptation, ultimately enhancing diagnostic accuracy and patient outcomes in COVID-19 lung imaging.https://www.mdpi.com/2504-2289/9/1/11federated learningdata heterogeneitynon-IIDglobal modelskew typesgeneralization metric
spellingShingle Fatimah Alhafiz
Abdullah Basuhail
The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
Big Data and Cognitive Computing
federated learning
data heterogeneity
non-IID
global model
skew types
generalization metric
title The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
title_full The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
title_fullStr The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
title_full_unstemmed The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
title_short The Data Heterogeneity Issue Regarding COVID-19 Lung Imaging in Federated Learning: An Experimental Study
title_sort data heterogeneity issue regarding covid 19 lung imaging in federated learning an experimental study
topic federated learning
data heterogeneity
non-IID
global model
skew types
generalization metric
url https://www.mdpi.com/2504-2289/9/1/11
work_keys_str_mv AT fatimahalhafiz thedataheterogeneityissueregardingcovid19lungimaginginfederatedlearninganexperimentalstudy
AT abdullahbasuhail thedataheterogeneityissueregardingcovid19lungimaginginfederatedlearninganexperimentalstudy
AT fatimahalhafiz dataheterogeneityissueregardingcovid19lungimaginginfederatedlearninganexperimentalstudy
AT abdullahbasuhail dataheterogeneityissueregardingcovid19lungimaginginfederatedlearninganexperimentalstudy