CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval

Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-mod...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen Chen, Dan Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10843200/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586839993614336
author Chen Chen
Dan Wang
author_facet Chen Chen
Dan Wang
author_sort Chen Chen
collection DOAJ
description Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks.
format Article
id doaj-art-12262b8b61614f6dbb0c719651211b4e
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-12262b8b61614f6dbb0c719651211b4e2025-01-25T00:01:30ZengIEEEIEEE Access2169-35362025-01-0113127341274510.1109/ACCESS.2025.352994210843200CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal RetrievalChen Chen0https://orcid.org/0000-0003-0579-2353Dan Wang1https://orcid.org/0000-0001-9302-3233Taihu Laboratory of Deepsea Technological Science, Wuxi, ChinaState Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, ChinaCross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks.https://ieeexplore.ieee.org/document/10843200/Multimodal learningcross-modal retrievalcausal learningattention augmentation
spellingShingle Chen Chen
Dan Wang
CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
IEEE Access
Multimodal learning
cross-modal retrieval
causal learning
attention augmentation
title CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_full CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_fullStr CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_full_unstemmed CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_short CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_sort causmatch causal matching learning with counterfactual preference framework for cross modal retrieval
topic Multimodal learning
cross-modal retrieval
causal learning
attention augmentation
url https://ieeexplore.ieee.org/document/10843200/
work_keys_str_mv AT chenchen causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval
AT danwang causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval