Attention-enhanced StrongSORT for robust vehicle tracking in complex environments

Abstract While multi-object tracking is critical for autonomous driving systems, traditional algorithms exhibit three fundamental limitations in complex scenarios: (1) blurred feature representation under occlusion and re-identification scenarios causing identity switches, (2) insufficient sensitivi...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei Xu, Xiaodong Du, Ruochen Li, Bingjie Li, Yuhu Jiao, Lei Xing
Format: Article
Language:English
Published: Nature Portfolio 2025-05-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-99524-5
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract While multi-object tracking is critical for autonomous driving systems, traditional algorithms exhibit three fundamental limitations in complex scenarios: (1) blurred feature representation under occlusion and re-identification scenarios causing identity switches, (2) insufficient sensitivity to scale-variant targets due to fixed geometric constraints in conventional IoU-based loss functions, and (3) gradient degradation in deep convolutional layers hindering discriminative feature learning. To address these challenges, we propose AE-StrongSORT (Attention-Enhanced StrongSORT), an attention-enhanced tracking framework featuring three systematic innovations: first, the GAM-YOLO (global attention mechanism-YOLO)hybrid architecture integrates multi-scale feature fusion with a global attention mechanism (GC2f structure). This design enhances cross-dimensional feature interaction through localized channel-spatial attention gates, significantly improving occlusion-resistant feature representation (IDF1  $$\uparrow$$  9.99%, IDsw  $$\downarrow$$  9.85%). Second, the F-EIoU loss function introduces dynamic size-dependent penalty terms and difficulty-adaptive weighting factors, effectively balancing learning priorities between small targets and normal instances. Third, the optimized CBH-Conv module employs Hardswish activation and depthwise separable convolution to mitigate gradient vanishing while maintaining real-time efficiency (achieving a 17% MOTA improvement at 213 FPS).Evaluated on the MOT-16 dataset, AE-StrongSORT demonstrates substantial improvements over the baseline StrongSORT, with 17%, 2.78%, and 9.99% gains in MOTA, HOTA, and IDF1 metrics respectively, alongside significant reductions in false/missed detections. These advances establish a novel technical pathway for robust vehicle tracking in real-world traffic scenarios characterized by coexisting challenges of scale variation, motion blur, and dense occlusion.
ISSN:2045-2322