FedDrip: Federated Learning With Diffusion-Generated Synthetic Image

In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data ch...

Full description

Saved in:
Bibliographic Details
Main Authors: Karin Huangsuwan, Timothy Liu, Simon See, Aik Beng Ng, Peerapon Vateekul
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10824802/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592906968367104
author Karin Huangsuwan
Timothy Liu
Simon See
Aik Beng Ng
Peerapon Vateekul
author_facet Karin Huangsuwan
Timothy Liu
Simon See
Aik Beng Ng
Peerapon Vateekul
author_sort Karin Huangsuwan
collection DOAJ
description In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models.
format Article
id doaj-art-108062df2f4645c7866095702d8bfd22
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-108062df2f4645c7866095702d8bfd222025-01-21T00:01:49ZengIEEEIEEE Access2169-35362025-01-0113101111012510.1109/ACCESS.2025.352580610824802FedDrip: Federated Learning With Diffusion-Generated Synthetic ImageKarin Huangsuwan0https://orcid.org/0009-0001-8456-2662Timothy Liu1https://orcid.org/0009-0006-6563-5439Simon See2https://orcid.org/0000-0002-4958-9237Aik Beng Ng3https://orcid.org/0009-0009-1291-1753Peerapon Vateekul4https://orcid.org/0000-0001-9718-3592Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeDepartment of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandIn the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models.https://ieeexplore.ieee.org/document/10824802/Federated learningdeep learningdiffusion modelX-ray classificationprompt engineeringnon-IID data
spellingShingle Karin Huangsuwan
Timothy Liu
Simon See
Aik Beng Ng
Peerapon Vateekul
FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
IEEE Access
Federated learning
deep learning
diffusion model
X-ray classification
prompt engineering
non-IID data
title FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
title_full FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
title_fullStr FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
title_full_unstemmed FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
title_short FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
title_sort feddrip federated learning with diffusion generated synthetic image
topic Federated learning
deep learning
diffusion model
X-ray classification
prompt engineering
non-IID data
url https://ieeexplore.ieee.org/document/10824802/
work_keys_str_mv AT karinhuangsuwan feddripfederatedlearningwithdiffusiongeneratedsyntheticimage
AT timothyliu feddripfederatedlearningwithdiffusiongeneratedsyntheticimage
AT simonsee feddripfederatedlearningwithdiffusiongeneratedsyntheticimage
AT aikbengng feddripfederatedlearningwithdiffusiongeneratedsyntheticimage
AT peeraponvateekul feddripfederatedlearningwithdiffusiongeneratedsyntheticimage