Transformer-Based Motion Predictor for Multi-Dancer Tracking in Non-Linear Movements of Dancesport Performance

Automated multi-dancer tracking is a critical yet challenging task in Dance Quality Assessment (DanceQA), requiring precise motion estimation to evaluate synchronization, formation transitions, and rhythmic accuracy. Traditional Multi-Object Tracking (MOT) frameworks predominantly rely on appearance...

Full description

Saved in:
Bibliographic Details
Main Author: Zhiling Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11028110/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automated multi-dancer tracking is a critical yet challenging task in Dance Quality Assessment (DanceQA), requiring precise motion estimation to evaluate synchronization, formation transitions, and rhythmic accuracy. Traditional Multi-Object Tracking (MOT) frameworks predominantly rely on appearance-based features and Kalman Filter-based motion models, which struggle with complex, non-linear motion patterns exhibited in dance performances. These conventional approaches often suffer from identity fragmentation, occlusion-related failures, and inaccurate motion predictions due to their inherent assumption of constant velocity. Although recent deep learning-based trackers incorporating recurrent architectures and transformers have improved motion modeling, they still lack adaptability to highly dynamic motion variations and remain heavily reliant on large-scale training datasets. To bridge this gap, we propose the Multi-Dancer Spatio-Temporal Tracker (MDSTT), a novel transformer-based framework that exclusively leverages historical motion cues for robust and identity-consistent tracking. Unlike conventional tracking methods that integrate appearance features, MDSTT processes historical bounding box trajectories through a transformer encoder, capturing both long-range and short-term spatio-temporal dependencies while mitigating occlusion-induced identity switches. The proposed framework introduces a Historical Trajectory Embedding module to enhance motion-based representation learning, an Adaptable Motion Predictor with a learnable prediction token for improved trajectory continuity, and a refined Hungarian Matching strategy incorporating Intersection-over-Union (IoU), motion direction difference, and L1 distance to optimize object association. Additionally, probabilistic masked token augmentation is incorporated to simulate real-world occlusion scenarios, improving resilience against missing detections. Extensive evaluations on the DanceTrack dataset demonstrate that MDSTT achieves state-of-the-art (SoTA) tracking performance, surpassing existing methods with a 22.3% improvement in HOTA (77.4 vs. 63.3), 7.6% higher detection accuracy (86.4 vs. 80.3), and 26.6% better identity association accuracy (63.4 vs. 50.1) compared to SoTA transformer-based MOT models.
ISSN:2169-3536