CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval

Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-mod...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chen Chen, Dan Wang
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Multimodal learning cross-modal retrieval causal learning attention augmentation
Online Access:	https://ieeexplore.ieee.org/document/10843200/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586839993614336
author	Chen Chen Dan Wang
author_facet	Chen Chen Dan Wang
author_sort	Chen Chen
collection	DOAJ
description	Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks.
format	Article
id	doaj-art-12262b8b61614f6dbb0c719651211b4e
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-12262b8b61614f6dbb0c719651211b4e2025-01-25T00:01:30ZengIEEEIEEE Access2169-35362025-01-0113127341274510.1109/ACCESS.2025.352994210843200CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal RetrievalChen Chen0https://orcid.org/0000-0003-0579-2353Dan Wang1https://orcid.org/0000-0001-9302-3233Taihu Laboratory of Deepsea Technological Science, Wuxi, ChinaState Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, ChinaCross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks.https://ieeexplore.ieee.org/document/10843200/Multimodal learningcross-modal retrievalcausal learningattention augmentation
spellingShingle	Chen Chen Dan Wang CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval IEEE Access Multimodal learning cross-modal retrieval causal learning attention augmentation
title	CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_full	CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_fullStr	CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_full_unstemmed	CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_short	CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
title_sort	causmatch causal matching learning with counterfactual preference framework for cross modal retrieval
topic	Multimodal learning cross-modal retrieval causal learning attention augmentation
url	https://ieeexplore.ieee.org/document/10843200/
work_keys_str_mv	AT chenchen causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval AT danwang causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval

CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval

Similar Items