FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data ch...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10824802/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592906968367104 |
---|---|
author | Karin Huangsuwan Timothy Liu Simon See Aik Beng Ng Peerapon Vateekul |
author_facet | Karin Huangsuwan Timothy Liu Simon See Aik Beng Ng Peerapon Vateekul |
author_sort | Karin Huangsuwan |
collection | DOAJ |
description | In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models. |
format | Article |
id | doaj-art-108062df2f4645c7866095702d8bfd22 |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-108062df2f4645c7866095702d8bfd222025-01-21T00:01:49ZengIEEEIEEE Access2169-35362025-01-0113101111012510.1109/ACCESS.2025.352580610824802FedDrip: Federated Learning With Diffusion-Generated Synthetic ImageKarin Huangsuwan0https://orcid.org/0009-0001-8456-2662Timothy Liu1https://orcid.org/0009-0006-6563-5439Simon See2https://orcid.org/0000-0002-4958-9237Aik Beng Ng3https://orcid.org/0009-0009-1291-1753Peerapon Vateekul4https://orcid.org/0000-0001-9718-3592Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeNVIDIA AI Technology Center, 32 Carpenter Street, SingaporeDepartment of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, ThailandIn the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models.https://ieeexplore.ieee.org/document/10824802/Federated learningdeep learningdiffusion modelX-ray classificationprompt engineeringnon-IID data |
spellingShingle | Karin Huangsuwan Timothy Liu Simon See Aik Beng Ng Peerapon Vateekul FedDrip: Federated Learning With Diffusion-Generated Synthetic Image IEEE Access Federated learning deep learning diffusion model X-ray classification prompt engineering non-IID data |
title | FedDrip: Federated Learning With Diffusion-Generated Synthetic Image |
title_full | FedDrip: Federated Learning With Diffusion-Generated Synthetic Image |
title_fullStr | FedDrip: Federated Learning With Diffusion-Generated Synthetic Image |
title_full_unstemmed | FedDrip: Federated Learning With Diffusion-Generated Synthetic Image |
title_short | FedDrip: Federated Learning With Diffusion-Generated Synthetic Image |
title_sort | feddrip federated learning with diffusion generated synthetic image |
topic | Federated learning deep learning diffusion model X-ray classification prompt engineering non-IID data |
url | https://ieeexplore.ieee.org/document/10824802/ |
work_keys_str_mv | AT karinhuangsuwan feddripfederatedlearningwithdiffusiongeneratedsyntheticimage AT timothyliu feddripfederatedlearningwithdiffusiongeneratedsyntheticimage AT simonsee feddripfederatedlearningwithdiffusiongeneratedsyntheticimage AT aikbengng feddripfederatedlearningwithdiffusiongeneratedsyntheticimage AT peeraponvateekul feddripfederatedlearningwithdiffusiongeneratedsyntheticimage |