FedDrip: Federated Learning With Diffusion-Generated Synthetic Image
In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data ch...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10824802/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models. |
---|---|
ISSN: | 2169-3536 |