Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification

Abstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on cloth...

Full description

Saved in:
Bibliographic Details
Main Authors: Yongkang Ding, Jiechen Li, Hao Wang, Ziang Liu, Anqi Wang
Format: Article
Language:English
Published: Springer 2024-11-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01646-2
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571154225692672
author Yongkang Ding
Jiechen Li
Hao Wang
Ziang Liu
Anqi Wang
author_facet Yongkang Ding
Jiechen Li
Hao Wang
Ziang Liu
Anqi Wang
author_sort Yongkang Ding
collection DOAJ
description Abstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.
format Article
id doaj-art-2e8ba9afc8fb474984f7eb0306c70674
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2024-11-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-2e8ba9afc8fb474984f7eb0306c706742025-02-02T12:49:14ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-11-0111111510.1007/s40747-024-01646-2Attention-enhanced multimodal feature fusion network for clothes-changing person re-identificationYongkang Ding0Jiechen Li1Hao Wang2Ziang Liu3Anqi Wang4College of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsEngineering in Electrical Engineering, University of southern CaliforniaSchool of Computer Science, Carnegie Mellon UniversityElectrical and Computer Engineering, Carnegie Mellon UniversityCollege of Computer Science and Technology, Nanjing University of Aeronautics and AstronauticsAbstract Clothes-Changing Person Re-Identification is a challenging problem in computer vision, primarily due to the appearance variations caused by clothing changes across different camera views. This poses significant challenges to traditional person re-identification techniques that rely on clothing features. These challenges include the inconsistency of clothing and the difficulty in learning reliable clothing-irrelevant local features. To address this issue, we propose a novel network architecture called the Attention-Enhanced Multimodal Feature Fusion Network (AE-Net). AE-Net effectively mitigates the impact of clothing changes on recognition accuracy by integrating RGB global features, grayscale image features, and clothing-irrelevant features obtained through semantic segmentation. Specifically, global features capture the overall appearance of the person; grayscale image features help eliminate the interference of color in recognition; and clothing-irrelevant features derived from semantic segmentation enforce the model to learn features independent of the person’s clothing. Additionally, we introduce a multi-scale fusion attention mechanism that further enhances the model’s ability to capture both detailed and global structures, thereby improving recognition accuracy and robustness. Extensive experimental results demonstrate that AE-Net outperforms several state-of-the-art methods on the PRCC and LTCC datasets, particularly in scenarios with significant clothing changes. On the PRCC and LTCC datasets, AE-Net achieves Top-1 accuracy rates of 60.4% and 42.9%, respectively.https://doi.org/10.1007/s40747-024-01646-2Person re-identificationClothes-changing scenariosComputer visionImage retrieval
spellingShingle Yongkang Ding
Jiechen Li
Hao Wang
Ziang Liu
Anqi Wang
Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
Complex & Intelligent Systems
Person re-identification
Clothes-changing scenarios
Computer vision
Image retrieval
title Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
title_full Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
title_fullStr Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
title_full_unstemmed Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
title_short Attention-enhanced multimodal feature fusion network for clothes-changing person re-identification
title_sort attention enhanced multimodal feature fusion network for clothes changing person re identification
topic Person re-identification
Clothes-changing scenarios
Computer vision
Image retrieval
url https://doi.org/10.1007/s40747-024-01646-2
work_keys_str_mv AT yongkangding attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification
AT jiechenli attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification
AT haowang attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification
AT ziangliu attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification
AT anqiwang attentionenhancedmultimodalfeaturefusionnetworkforclotheschangingpersonreidentification