Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism
In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication proces...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10701560/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850023115994693632 |
|---|---|
| author | Xue Zhang Mingjiang Wang Xiao Zeng Xuyi Zhuang |
| author_facet | Xue Zhang Mingjiang Wang Xiao Zeng Xuyi Zhuang |
| author_sort | Xue Zhang |
| collection | DOAJ |
| description | In the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication process. Currently, although emotional signals are prevalent in various modalities of conversation such as audio, video, and text, multimodal Emotion Recognition in Conversations (ERC) still remains a challenging problem to tackle due to its inherent complexity. Previous research has tended to rely on a single modality, particularly text information, while neglecting the rich emotional cues present in audio and video modalities. Based on the current research status and challenges such as inadequate extraction of contextual emotional dynamic features and data scarcity, a multimodal emotion recognition method called Attention-based Fusion Contextual Attention Network (Af-CAN) has been proposed to break through these limitations. Af-CAN is meticulously designed with a multimodal feature fusion mechanism that can extract emotion-relevant features from different sources of information and uses advanced attention mechanisms to integrate these features, ensuring the comprehensiveness and accuracy of emotion recognition. Furthermore, in response to the characteristics of emotional dynamics and context dependency in conversations, this framework introduces a special context modeling unit capable of tracking the evolution of emotional states in the conversation and the mutual influence of emotions between speakers. Experimental evaluations carried out on multiple standard datasets have shown that Af-CAN outperforms existing ERC systems on various evaluation metrics, particularly showing significant advantages in handling complex emotional changes in conversations, laying a solid foundation for advancing the application of emotional intelligence in human-computer interaction. |
| format | Article |
| id | doaj-art-34c1e68b83704be2816e6e23fb554815 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-34c1e68b83704be2816e6e23fb5548152025-08-20T03:01:28ZengIEEEIEEE Access2169-35362025-01-0113448584487110.1109/ACCESS.2024.347161310701560Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention MechanismXue Zhang0https://orcid.org/0009-0003-8169-4453Mingjiang Wang1https://orcid.org/0000-0002-4706-009XXiao Zeng2Xuyi Zhuang3Key Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaKey Laboratory for Key Technologies of IoT Terminals, Harbin Institute of Technology (Shenzhen), Shenzhen, ChinaIn the pursuit of developing an efficient and harmonious human-computer interaction interface, Emotion Recognition in Conversations (ERC) is particularly important. It requires the system to delicately capture and understand the nuances of human emotional fluctuations during the communication process. Currently, although emotional signals are prevalent in various modalities of conversation such as audio, video, and text, multimodal Emotion Recognition in Conversations (ERC) still remains a challenging problem to tackle due to its inherent complexity. Previous research has tended to rely on a single modality, particularly text information, while neglecting the rich emotional cues present in audio and video modalities. Based on the current research status and challenges such as inadequate extraction of contextual emotional dynamic features and data scarcity, a multimodal emotion recognition method called Attention-based Fusion Contextual Attention Network (Af-CAN) has been proposed to break through these limitations. Af-CAN is meticulously designed with a multimodal feature fusion mechanism that can extract emotion-relevant features from different sources of information and uses advanced attention mechanisms to integrate these features, ensuring the comprehensiveness and accuracy of emotion recognition. Furthermore, in response to the characteristics of emotional dynamics and context dependency in conversations, this framework introduces a special context modeling unit capable of tracking the evolution of emotional states in the conversation and the mutual influence of emotions between speakers. Experimental evaluations carried out on multiple standard datasets have shown that Af-CAN outperforms existing ERC systems on various evaluation metrics, particularly showing significant advantages in handling complex emotional changes in conversations, laying a solid foundation for advancing the application of emotional intelligence in human-computer interaction.https://ieeexplore.ieee.org/document/10701560/Emotion recognitiontransfer learningmultimodal |
| spellingShingle | Xue Zhang Mingjiang Wang Xiao Zeng Xuyi Zhuang Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism IEEE Access Emotion recognition transfer learning multimodal |
| title | Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism |
| title_full | Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism |
| title_fullStr | Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism |
| title_full_unstemmed | Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism |
| title_short | Af-CAN: Multimodal Emotion Recognition Method Based on Situational Attention Mechanism |
| title_sort | af can multimodal emotion recognition method based on situational attention mechanism |
| topic | Emotion recognition transfer learning multimodal |
| url | https://ieeexplore.ieee.org/document/10701560/ |
| work_keys_str_mv | AT xuezhang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism AT mingjiangwang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism AT xiaozeng afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism AT xuyizhuang afcanmultimodalemotionrecognitionmethodbasedonsituationalattentionmechanism |