Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
Abstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on cloth...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2024-11-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01646-2 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571154225692672 |
---|---|
author | Yongkang Ding Jiechen Li Hao Wang Ziang Liu Anqi Wang |
author_facet | Yongkang Ding Jiechen Li Hao Wang Ziang Liu Anqi Wang |
author_sort | Yongkang Ding |
collection | DOAJ |
description | Abstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively. |
format | Article |
id | doaj-art-2e8ba9afc8fb474984f7eb0306c70674 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2024-11-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-2e8ba9afc8fb474984f7eb0306c706742025-02-02T12:49:14ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-11-0111111510.1007/s40747-024-01646-2Attention-enhanced multimodal feature fusion network for clothes-changing person re-identificationYongkang Ding0Jiechen Li1Hao Wang2Ziang Liu3Anqi Wang4College of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsEngineering in Electrical Engineering, University of southern CaliforniaSchool of Computer Science, Carnegie Mellon UniversityElectrical and Computer Engineering, Carnegie Mellon UniversityCollege of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsAbstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.https://doi.org/10.1007/s40747-024-01646-2Person re-identificationClothes-changing scenariosComputer visionImage retrieval |
spellingShingle | Yongkang Ding Jiechen Li Hao Wang Ziang Liu Anqi Wang Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification Complex & Intelligent Systems Person re-identification Clothes-changing scenarios Computer vision Image retrieval |
title | Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification |
title_full | Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification |
title_fullStr | Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification |
title_full_unstemmed | Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification |
title_short | Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification |
title_sort | attention enhanced multimodal feature fusion network for clothes changing person re identification |
topic | Person re-identification Clothes-changing scenarios Computer vision Image retrieval |
url | https://doi.org/10.1007/s40747-024-01646-2 |
work_keys_str_mv | AT yongkangding attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification AT jiechenli attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification AT haowang attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification AT ziangliu attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification AT anqiwang attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification |