Metaphor recognition based on cross-modal multi-level information fusion

Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As th...

Full description

Saved in:
Bibliographic Details
Main Authors: Qimeng Yang, Yuanbo Yan, Xiaoyu He, Shisong Guo
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01684-w
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.
ISSN:2199-4536
2198-6053