CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-mod...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10843200/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks. |
---|---|
ISSN: | 2169-3536 |