A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation

Abstract The exponential growth of Big Data in healthcare, particularly in AI-driven medical diagnostics, has raised critical concerns about data privacy in medical image classification. With over 30% of healthcare organizations worldwide experiencing data breaches in the past year, the demand for s...

Full description

Saved in:
Bibliographic Details
Main Authors: Rahul Haripriya, Nilay Khare, Manish Pandey, Sreemoyee Biswas
Format: Article
Language:English
Published: SpringerOpen 2025-05-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-025-01169-8
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The exponential growth of Big Data in healthcare, particularly in AI-driven medical diagnostics, has raised critical concerns about data privacy in medical image classification. With over 30% of healthcare organizations worldwide experiencing data breaches in the past year, the demand for secure, privacy-preserving solutions is more urgent than ever. This study explores a federated learning approach combined with transfer learning to enhance privacy in medical image classification using ResNet and VGG16 architectures. Pre-trained on ImageNet and fine tuned on three specialized medical datasets TB chest X-rays, brain tumor MRI scans, and diabetic retinopathy images these models were deployed in a simulated multi-center healthcare environment. A major contribution of this work is the development of an adaptive aggregation methodology, which dynamically selects between Federated Averaging (FedAvg) and Federated Stochastic Gradient Descent (FedSGD) based on real-time data divergence observed across participating clients. Unlike conventional static aggregation methods, which uniformly apply the same update rule regardless of data heterogeneity, the proposed adaptive approach monitors gradients and data distributions at each communication round and selects the most suitable aggregation method dynamically. This adaptive strategy not only improves convergence but also optimizes resource utilization, making it suitable for multi-center healthcare networks where data heterogeneity is prevalent. The novelty of the proposed adaptive aggregation lies in its ability to maintain robust performance while minimizing computational costs, making it feasible for large-scale healthcare AI networks, such as hospital federated learning systems. Comparative analysis with baseline FL models, including FedAvg and FedSGD, shows that the adaptive aggregation method achieves comparable accuracy (up to 96.3%) while significantly reducing execution time by approximately 20% and maintaining a competitive F1-score. Additionally, the integration of privacy-preserving techniques ensures that sensitive patient data remains secure throughout the learning process. By integrating transfer learning with federated learning, this study presents a scalable and privacy-preserving framework for Big Data analytics in healthcare. The findings underscore the potential of adaptive aggregation to enhance federated learning efficiency across heterogeneous datasets, enabling medical institutions to develop high-accuracy diagnostic models without direct access to patient data.
ISSN:2196-1115