FedDrip: Federated Learning With Diffusion-Generated Synthetic Image

In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data ch...

Full description

Saved in:
Bibliographic Details
Main Authors: Karin Huangsuwan, Timothy Liu, Simon See, Aik Beng Ng, Peerapon Vateekul
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10824802/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the realm of machine learning in healthcare, federated learning (FL) is often recognized as a practical solution for addressing issues related to data privacy and data distribution. However, many real-world datasets are not identically and independently distributed (non-IID). That is, the data characteristics differ from one institute to another. Non-IID data poses challenges to the convergence of FL models, such as client drifting, where the model weights drift towards local optima instead of the global optimum. As a solution, leveraging synthetic data not only addresses data distribution issues but also helps overcome data scarcity. In this work, we propose a novel framework called “FedDrip (Federated Learning with Diffusion Reinforcement at Pseudo-site)” that utilizes diffusion-generated synthetic data to alleviate these challenges in non-IID environments. In addition to traditional federated learning, we introduce a pseudo-site concept, where an additional model—the pseudo-site student model—is trained using synthetic data to extract knowledge from real clients acting as teacher models. By leveraging this strategy, the pseudo-site improves model generalization across diverse datasets while preserving data privacy. Moreover, we demonstrate that this virtual-client strategy can be integrated into any federated learning framework, including foundational algorithms such as FedAvg and advanced algorithms such as FedDyn and FedProx. Experiments conducted on the NIH ChestX-ray14 dataset show that our method enhances the performance of the state-of-the-art and foundational methods by 2.15%, 0.95%, and 1.96% for FedAvg, FedDyn, and FedProx, respectively, using the AUC metric. We also performed empirical experiments investigating the effects of prompting style, prompt correctness, and data availability on the inference of diffusion models, using the Fréchet Inception Distance (FID) metric for generative models.
ISSN:2169-3536