Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism

In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication proces...

Full description

Saved in:
Bibliographic Details
Main Authors: Xue Zhang, Mingjiang Wang, Xiao Zeng, Xuyi Zhuang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10701560/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication process. Currently, although emotional signals are prevalent in various modalities of conversation such as audio, video, and text, multimodal Emotion Recognition in Conversations (ERC) still remains a challenging problem to tackle due to its inherent complexity. Previous research has tended to rely on a single modality, particularly text information, while neglecting the rich emotional cues present in audio and video modalities. Based on the current research status and challenges such as inadequate extraction of contextual emotional dynamic features and data scarcity, a multimodal emotion recognition method called Attention-based Fusion Contextual Attention Network (Af-CAN) has been proposed to break through these limitations. Af-CAN is meticulously designed with a multimodal feature fusion mechanism that can extract emotion-relevant features from different sources of information and uses advanced attention mechanisms to integrate these features, ensuring the comprehensiveness and accuracy of emotion recognition. Furthermore, in response to the characteristics of emotional dynamics and context dependency in conversations, this framework introduces a special context modeling unit capable of tracking the evolution of emotional states in the conversation and the mutual influence of emotions between speakers. Experimental evaluations carried out on multiple standard datasets have shown that Af-CAN outperforms existing ERC systems on various evaluation metrics, particularly showing significant advantages in handling complex emotional changes in conversations, laying a solid foundation for advancing the application of emotional intelligence in human-computer interaction.
ISSN:2169-3536