Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism

In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication proces...

Full description

Saved in:
Bibliographic Details
Main Authors: Xue Zhang, Mingjiang Wang, Xiao Zeng, Xuyi Zhuang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10701560/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850023115994693632
author Xue Zhang
Mingjiang Wang
Xiao Zeng
Xuyi Zhuang
author_facet Xue Zhang
Mingjiang Wang
Xiao Zeng
Xuyi Zhuang
author_sort Xue Zhang
collection DOAJ
description In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication process. Currently, although emotional signals are prevalent in various modalities of conversation such as audio, video, and text, multimodal Emotion Recognition in Conversations (ERC) still remains a challenging problem to tackle due to its inherent complexity. Previous research has tended to rely on a single modality, particularly text information, while neglecting the rich emotional cues present in audio and video modalities. Based on the current research status and challenges such as inadequate extraction of contextual emotional dynamic features and data scarcity, a multimodal emotion recognition method called Attention-based Fusion Contextual Attention Network (Af-CAN) has been proposed to break through these limitations. Af-CAN is meticulously designed with a multimodal feature fusion mechanism that can extract emotion-relevant features from different sources of information and uses advanced attention mechanisms to integrate these features, ensuring the comprehensiveness and accuracy of emotion recognition. Furthermore, in response to the characteristics of emotional dynamics and context dependency in conversations, this framework introduces a special context modeling unit capable of tracking the evolution of emotional states in the conversation and the mutual influence of emotions between speakers. Experimental evaluations carried out on multiple standard datasets have shown that Af-CAN outperforms existing ERC systems on various evaluation metrics, particularly showing significant advantages in handling complex emotional changes in conversations, laying a solid foundation for advancing the application of emotional intelligence in human-computer interaction.
format Article
id doaj-art-34c1e68b83704be2816e6e23fb554815
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-34c1e68b83704be2816e6e23fb5548152025-08-20T03:01:28ZengIEEEIEEE Access2169-35362025-01-0113448584487110.1109/ACCESS.2024.347161310701560Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention MechanismXue Zhang0https://orcid.org/0009-0003-8169-4453Mingjiang Wang1https://orcid.org/0000-0002-4706-009XXiao Zeng2Xuyi Zhuang3Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaIn the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication process. Currently, although emotional signals are prevalent in various modalities of conversation such as audio, video, and text, multimodal Emotion Recognition in Conversations (ERC) still remains a challenging problem to tackle due to its inherent complexity. Previous research has tended to rely on a single modality, particularly text information, while neglecting the rich emotional cues present in audio and video modalities. Based on the current research status and challenges such as inadequate extraction of contextual emotional dynamic features and data scarcity, a multimodal emotion recognition method called Attention-based Fusion Contextual Attention Network (Af-CAN) has been proposed to break through these limitations. Af-CAN is meticulously designed with a multimodal feature fusion mechanism that can extract emotion-relevant features from different sources of information and uses advanced attention mechanisms to integrate these features, ensuring the comprehensiveness and accuracy of emotion recognition. Furthermore, in response to the characteristics of emotional dynamics and context dependency in conversations, this framework introduces a special context modeling unit capable of tracking the evolution of emotional states in the conversation and the mutual influence of emotions between speakers. Experimental evaluations carried out on multiple standard datasets have shown that Af-CAN outperforms existing ERC systems on various evaluation metrics, particularly showing significant advantages in handling complex emotional changes in conversations, laying a solid foundation for advancing the application of emotional intelligence in human-computer interaction.https://ieeexplore.ieee.org/document/10701560/Emotion recognitiontransfer learningmultimodal
spellingShingle Xue Zhang
Mingjiang Wang
Xiao Zeng
Xuyi Zhuang
Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
IEEE Access
Emotion recognition
transfer learning
multimodal
title Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
title_full Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
title_fullStr Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
title_full_unstemmed Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
title_short Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
title_sort af can multimodal emotion recognition method based on situational attention mechanism
topic Emotion recognition
transfer learning
multimodal
url https://ieeexplore.ieee.org/document/10701560/
work_keys_str_mv AT xuezhang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism
AT mingjiangwang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism
AT xiaozeng afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism
AT xuyizhuang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism