Hybrid Uncertainty Metrics-Based Privacy-Preserving Alternating Multimodal Representation Learning

Multimodal learning enhances model performance by integrating heterogeneous data but is hindered by modality laziness and privacy vulnerabilities. Modality laziness occurs when the model overly relies on a single modality for predictions, underutilizing other modalities and leading to suboptimal per...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhe Sun, Yaowei Huang, Aohai Zhang, Chao Li, Lifan Jiang, Xiaotong Liao, Ran Li, Junping Wan
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/10/5229
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multimodal learning enhances model performance by integrating heterogeneous data but is hindered by modality laziness and privacy vulnerabilities. Modality laziness occurs when the model overly relies on a single modality for predictions, underutilizing other modalities and leading to suboptimal performance and poor cross-modal integration. Privacy vulnerabilities arise when sensitive data from individual modalities are exposed during training or inference, risking unauthorized access or attacks, especially in shared model components. In this paper, we propose Privacy-Preserving Alternating Multimodal Representation Learning (PAMRL). Built on Multimodal Learning with Alternating Unimodal Adaptation (MLA), PAMRL alternately optimizes unimodal encoders and a shared representation head to mitigate modality laziness and improve cross-modal consistency. It introduces a hybrid uncertainty metric combining KL divergence and entropy to enhance prediction robustness while applying differential privacy to protect sensitive data in unimodal encoders, preserving the shared head for efficient cross-modal fusion. Extensive experiments on the MVSA and CREMA-D datasets, comparing PAMRL with MLA and other baselines, demonstrate its superior performance, achieving an optimal balance of predictive accuracy, attack resilience, and privacy protection, thus supporting secure, efficient multimodal applications.
ISSN:2076-3417