Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance
Automated multi-dancer tracking is a critical yet challenging task in Dance Quality Assessment (DanceQA), requiring precise motion estimation to evaluate synchronization, formation transitions, and rhythmic accuracy. Traditional Multi-Object Tracking (MOT) frameworks predominantly rely on appearance...
Saved in:
| Main Author: | |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11028110/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Automated multi-dancer tracking is a critical yet challenging task in Dance Quality Assessment (DanceQA), requiring precise motion estimation to evaluate synchronization, formation transitions, and rhythmic accuracy. Traditional Multi-Object Tracking (MOT) frameworks predominantly rely on appearance-based features and Kalman Filter-based motion models, which struggle with complex, non-linear motion patterns exhibited in dance performances. These conventional approaches often suffer from identity fragmentation, occlusion-related failures, and inaccurate motion predictions due to their inherent assumption of constant velocity. Although recent deep learning-based trackers incorporating recurrent architectures and transformers have improved motion modeling, they still lack adaptability to highly dynamic motion variations and remain heavily reliant on large-scale training datasets. To bridge this gap, we propose the Multi-Dancer Spatio-Temporal Tracker (MDSTT), a novel transformer-based framework that exclusively leverages historical motion cues for robust and identity-consistent tracking. Unlike conventional tracking methods that integrate appearance features, MDSTT processes historical bounding box trajectories through a transformer encoder, capturing both long-range and short-term spatio-temporal dependencies while mitigating occlusion-induced identity switches. The proposed framework introduces a Historical Trajectory Embedding module to enhance motion-based representation learning, an Adaptable Motion Predictor with a learnable prediction token for improved trajectory continuity, and a refined Hungarian Matching strategy incorporating Intersection-over-Union (IoU), motion direction difference, and L1 distance to optimize object association. Additionally, probabilistic masked token augmentation is incorporated to simulate real-world occlusion scenarios, improving resilience against missing detections. Extensive evaluations on the DanceTrack dataset demonstrate that MDSTT achieves state-of-the-art (SoTA) tracking performance, surpassing existing methods with a 22.3% improvement in HOTA (77.4 vs. 63.3), 7.6% higher detection accuracy (86.4 vs. 80.3), and 26.6% better identity association accuracy (63.4 vs. 50.1) compared to SoTA transformer-based MOT models. |
|---|---|
| ISSN: | 2169-3536 |