BiFFN: Bi-Frequency Guided Feature Fusion Network for Visible–Infrared Person Re-Identification

Visible–infrared person re-identification (VI-ReID) aims to minimize the modality gaps of pedestrian images across different modalities. Existing methods primarily focus on extracting cross-modality features from the spatial domain, which often limits the comprehensive extraction of useful informati...

Full description

Saved in:
Bibliographic Details
Main Authors: Xingyu Cao, Pengxin Ding, Jie Li, Mei Chen
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/5/1298
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Visible–infrared person re-identification (VI-ReID) aims to minimize the modality gaps of pedestrian images across different modalities. Existing methods primarily focus on extracting cross-modality features from the spatial domain, which often limits the comprehensive extraction of useful information. Compared with conventional approaches that either focus on single-frequency components or employ simple multi-branch fusion strategies, our method fundamentally addresses the modality discrepancy through systematic frequency-space co-learning. To address this limitation, we propose a novel bi-frequency feature fusion network (BiFFN) that effectively extracts and fuses features from both high- and low-frequency domains and spatial domain features to reduce modality gaps. The network introduces a frequency-spatial enhancement (FSE) module to enhance feature representation across both domains. Additionally, the deep frequency mining (DFM) module optimizes cross-modality information utilization by leveraging distinct features of high- and low-frequency features. The cross-frequency fusion (CFF) module further aligns low-frequency features and fuses them with high-frequency features to generate middle features that incorporate critical information from each modality. To refine the distribution of identity features in the common space, we develop a unified modality center (UMC) loss, which promotes a more balanced inter-modality distribution while preserving discriminative identity information. Extensive experiments demonstrate that the proposed BiFFN achieves state-of-the-art performance in VI-ReID. Specifically, our method achieved a Rank-1 accuracy of 77.5% and an mAP of 75.9% on the SYSU-MM01 dataset under the all-search mode. Additionally, it achieved a Rank-1 accuracy of 58.5% and an mAP of 63.7% on the LLCM dataset under the IR-VIS mode. These improvements verify that our model, with the integration of feature fusion and the incorporation of frequency domains, significantly reduces modality gaps and outperforms previous methods.
ISSN:1424-8220