Hybrid Uncertainty Metrics-Based Privacy-Preserving Alternating Multimodal Representation Learning
Multimodal learning enhances model performance by integrating heterogeneous data but is hindered by modality laziness and privacy vulnerabilities. Modality laziness occurs when the model overly relies on a single modality for predictions, underutilizing other modalities and leading to suboptimal per...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/10/5229 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Multimodal learning enhances model performance by integrating heterogeneous data but is hindered by modality laziness and privacy vulnerabilities. Modality laziness occurs when the model overly relies on a single modality for predictions, underutilizing other modalities and leading to suboptimal performance and poor cross-modal integration. Privacy vulnerabilities arise when sensitive data from individual modalities are exposed during training or inference, risking unauthorized access or attacks, especially in shared model components. In this paper, we propose Privacy-Preserving Alternating Multimodal Representation Learning (PAMRL). Built on Multimodal Learning with Alternating Unimodal Adaptation (MLA), PAMRL alternately optimizes unimodal encoders and a shared representation head to mitigate modality laziness and improve cross-modal consistency. It introduces a hybrid uncertainty metric combining KL divergence and entropy to enhance prediction robustness while applying differential privacy to protect sensitive data in unimodal encoders, preserving the shared head for efficient cross-modal fusion. Extensive experiments on the MVSA and CREMA-D datasets, comparing PAMRL with MLA and other baselines, demonstrate its superior performance, achieving an optimal balance of predictive accuracy, attack resilience, and privacy protection, thus supporting secure, efficient multimodal applications. |
|---|---|
| ISSN: | 2076-3417 |