CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval
Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-mod...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10843200/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586839993614336 |
---|---|
author | Chen Chen Dan Wang |
author_facet | Chen Chen Dan Wang |
author_sort | Chen Chen |
collection | DOAJ |
description | Cross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks. |
format | Article |
id | doaj-art-12262b8b61614f6dbb0c719651211b4e |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-12262b8b61614f6dbb0c719651211b4e2025-01-25T00:01:30ZengIEEEIEEE Access2169-35362025-01-0113127341274510.1109/ACCESS.2025.352994210843200CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal RetrievalChen Chen0https://orcid.org/0000-0003-0579-2353Dan Wang1https://orcid.org/0000-0001-9302-3233Taihu Laboratory of Deepsea Technological Science, Wuxi, ChinaState Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, ChinaCross-modal retrieval exhibits significant promise within the realm of multimedia analysis. Numerous sophisticated techniques have gained widespread adoption for harnessing attention mechanisms to facilitate cross-modal correspondence in matching tasks. However, most existing methods learn cross-modal attention based on conventional likelihood, leading to the fine-gained matching from regions and words containing numerous invalid local relationships and false global connections. This phenomenon will bring negative effects on alignment. Different from these methods, we propose a novel Causal Matching Learning (CausMatch) with counterfactual preference framework for cross-modal retrieval in this paper. This work seeks to ascertain the matching relation by incorporating a counterfactual causality preference, enhancing the quality of attention and providing a robust supervisory signal for the learning process are pivotal objectives. This study specifically investigates the influence of acquired visual and textual attention on network predictions by employing counterfactual intervention as a method of scrutiny. This approach aims to discern and analyze the consequential effects on the learning process. Our approach involves maximizing the positive influence to incentivize the network to assimilate more pertinent relationships and connections conducive to fine-grained cross-modal retrieval. The proposed CausMatch model’s effectiveness is systematically substantiated through a comprehensive series of experiments conducted on two widely recognized benchmark datasets, namely MS-COCO and Flickr30K. The results unequivocally demonstrate its superiority over existing state-of-the-art methods, underscoring its robust performance in cross-modal retrieval tasks.https://ieeexplore.ieee.org/document/10843200/Multimodal learningcross-modal retrievalcausal learningattention augmentation |
spellingShingle | Chen Chen Dan Wang CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval IEEE Access Multimodal learning cross-modal retrieval causal learning attention augmentation |
title | CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval |
title_full | CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval |
title_fullStr | CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval |
title_full_unstemmed | CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval |
title_short | CausMatch: Causal Matching Learning With Counterfactual Preference Framework for Cross-Modal Retrieval |
title_sort | causmatch causal matching learning with counterfactual preference framework for cross modal retrieval |
topic | Multimodal learning cross-modal retrieval causal learning attention augmentation |
url | https://ieeexplore.ieee.org/document/10843200/ |
work_keys_str_mv | AT chenchen causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval AT danwang causmatchcausalmatchinglearningwithcounterfactualpreferenceframeworkforcrossmodalretrieval |