Metaphor recognition based on cross-modal multi-level information fusion

Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As th...

Full description

Saved in:
Bibliographic Details
Main Authors: Qimeng Yang, Yuanbo Yan, Xiaoyu He, Shisong Guo
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01684-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571166633492480
author Qimeng Yang
Yuanbo Yan
Xiaoyu He
Shisong Guo
author_facet Qimeng Yang
Yuanbo Yan
Xiaoyu He
Shisong Guo
author_sort Qimeng Yang
collection DOAJ
description Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.
format Article
id doaj-art-5676beb30f08493aaa08de680d3d26d0
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-5676beb30f08493aaa08de680d3d26d02025-02-02T12:50:12ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111610.1007/s40747-024-01684-wMetaphor recognition based on cross-modal multi-level information fusionQimeng Yang0Yuanbo Yan1Xiaoyu He2Shisong Guo3College of Software, Xinjiang UniversityCollege of Software, Xinjiang UniversityCollege of Information Science and Engineering, Xinjiang UniversityCollege of Software, Xinjiang UniversityAbstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.https://doi.org/10.1007/s40747-024-01684-wMetaphor detectionMultimodalInformation fusionMeme
spellingShingle Qimeng Yang
Yuanbo Yan
Xiaoyu He
Shisong Guo
Metaphor recognition based on cross-modal multi-level information fusion
Complex & Intelligent Systems
Metaphor detection
Multimodal
Information fusion
Meme
title Metaphor recognition based on cross-modal multi-level information fusion
title_full Metaphor recognition based on cross-modal multi-level information fusion
title_fullStr Metaphor recognition based on cross-modal multi-level information fusion
title_full_unstemmed Metaphor recognition based on cross-modal multi-level information fusion
title_short Metaphor recognition based on cross-modal multi-level information fusion
title_sort metaphor recognition based on cross modal multi level information fusion
topic Metaphor detection
Multimodal
Information fusion
Meme
url https://doi.org/10.1007/s40747-024-01684-w
work_keys_str_mv AT qimengyang metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion
AT yuanboyan metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion
AT xiaoyuhe metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion
AT shisongguo metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion