Text this: Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking