Metaphor recognition based on cross-modal multi-level information fusion
Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As th...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2024-12-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01684-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571166633492480 |
---|---|
author | Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo |
author_facet | Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo |
author_sort | Qimeng Yang |
collection | DOAJ |
description | Abstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results. |
format | Article |
id | doaj-art-5676beb30f08493aaa08de680d3d26d0 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2024-12-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-5676beb30f08493aaa08de680d3d26d02025-02-02T12:50:12ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111610.1007/s40747-024-01684-wMetaphor recognition based on cross-modal multi-level information fusionQimeng Yang0Yuanbo Yan1Xiaoyu He2Shisong Guo3College of Software, Xinjiang UniversityCollege of Software, Xinjiang UniversityCollege of Information Science and Engineering, Xinjiang UniversityCollege of Software, Xinjiang UniversityAbstract The metaphor is a pervasive linguistic device that has become an active research topic in the computer field because of its essential role in language's cognitive and communicative processes. Currently, the rapid expansion of social media encourages the development of multimodal. As the most popular communication method in social media, memes have attracted the attention of many linguists, who believe that metaphors contain rich metaphorical information. However, multimodal metaphor detection suffers from insufficient information due to the short text of memes and lacks effective multimodal fusion methods. To address these problems, we utilize a single-pass non-autoregressive text generation method to convert images into text to provide additional textual information for the model. In addition, the information of different modes is fused by a multi-layer fusion module consisting of a prefix guide module and a similarity-aware aggregator. The module can reduce the heterogeneity between modes, learn fine-grained information, and better integrate the characteristic information of different modes. We conducted many experiments on the Met-Meme dataset. Compared with the strong baseline model in the experiment, the weighted F1 of our model on three data types of the MET-Meme dataset improved by 1.95%, 1.55%, and 1.72%, respectively. To further demonstrate the effectiveness of the proposed method, we also conducted experiments on a multimodal sarcasm dataset and obtained competitive results.https://doi.org/10.1007/s40747-024-01684-wMetaphor detectionMultimodalInformation fusionMeme |
spellingShingle | Qimeng Yang Yuanbo Yan Xiaoyu He Shisong Guo Metaphor recognition based on cross-modal multi-level information fusion Complex & Intelligent Systems Metaphor detection Multimodal Information fusion Meme |
title | Metaphor recognition based on cross-modal multi-level information fusion |
title_full | Metaphor recognition based on cross-modal multi-level information fusion |
title_fullStr | Metaphor recognition based on cross-modal multi-level information fusion |
title_full_unstemmed | Metaphor recognition based on cross-modal multi-level information fusion |
title_short | Metaphor recognition based on cross-modal multi-level information fusion |
title_sort | metaphor recognition based on cross modal multi level information fusion |
topic | Metaphor detection Multimodal Information fusion Meme |
url | https://doi.org/10.1007/s40747-024-01684-w |
work_keys_str_mv | AT qimengyang metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT yuanboyan metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT xiaoyuhe metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion AT shisongguo metaphorrecognitionbasedoncrossmodalmultilevelinformationfusion |