Metaphor recognition based on cross-modal multi-level information fusion

Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Qimeng Yang, Yuanbo Yan, Xiaoyu He, Shisong Guo
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	Complex & Intelligent Systems
Subjects:	Metaphor detection Multimodal Information fusion Meme
Online Access:	https://doi.org/10.1007/s40747-024-01684-w
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571166633492480
author	Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo
author_facet	Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo
author_sort	Qimeng Yang
collection	DOAJ
description	Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.
format	Article
id	doaj-art-5676beb30f08493aaa08de680d3d26d0
institution	Kabale University
issn	2199-4536 2198-6053
language	English
publishDate	2024-12-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-5676beb30f08493aaa08de680d3d26d02025-02-02T12:50:12ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111610.1007/s40747-024-01684-wMetaphor recognition based on cross-modal multi-level information fusionQimeng Yang0Yuanbo Yan1Xiaoyu He2Shisong Guo3College of Software, Xinjiang UniversityCollege of Software, Xinjiang UniversityCollege of Information Science and Engineering, Xinjiang UniversityCollege of Software, Xinjiang UniversityAbstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.https://doi.org/10.1007/s40747-024-01684-wMetaphor detectionMultimodalInformation fusionMeme
spellingShingle	Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo Metaphor recognition based on cross-modal multi-level information fusion Complex & Intelligent Systems Metaphor detection Multimodal Information fusion Meme
title	Metaphor recognition based on cross-modal multi-level information fusion
title_full	Metaphor recognition based on cross-modal multi-level information fusion
title_fullStr	Metaphor recognition based on cross-modal multi-level information fusion
title_full_unstemmed	Metaphor recognition based on cross-modal multi-level information fusion
title_short	Metaphor recognition based on cross-modal multi-level information fusion
title_sort	metaphor recognition based on cross modal multi level information fusion
topic	Metaphor detection Multimodal Information fusion Meme
url	https://doi.org/10.1007/s40747-024-01684-w
work_keys_str_mv	AT qimengyang metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT yuanboyan metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT xiaoyuhe metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT shisongguo metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion

Metaphor recognition based on cross-modal multi-level information fusion

Similar Items