Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking

Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy...

Full description

Saved in:
Bibliographic Details
Main Authors: Hao Zhang, Hengzhou Ye, Xiaoyu Guo, Xu Zhang, Yao Rong, Shuiwang Li
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/1/68
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588618266312704
author Hao Zhang
Hengzhou Ye
Xiaoyu Guo
Xu Zhang
Yao Rong
Shuiwang Li
author_facet Hao Zhang
Hengzhou Ye
Xiaoyu Guo
Xu Zhang
Yao Rong
Shuiwang Li
author_sort Hao Zhang
collection DOAJ
description Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time tracking tasks. Addressing the shortcomings of existing tracking systems in capturing temporal relationships between consecutive frames, NT-Track meticulously analyzes the positional changes in targets across frames and leverages the similarity of the surrounding areas to extract feature information. Furthermore, our method integrates spatial and temporal information seamlessly into a unified framework through the introduction of a temporal feature fusion technique, thereby bolstering the overall performance of the model. NT-Track also incorporates a spatial neighborhood feature extraction module, which focuses on identifying and extracting features within the neighborhood of the target in each frame, ensuring continuous focus on the target during inter-frame processing. By employing an improved Transformer backbone network, our approach effectively integrates spatio-temporal information, enhancing the accuracy and robustness of tracking. Our experimental results on several challenging benchmark datasets demonstrate that NT-Track surpasses existing lightweight and deep learning trackers in terms of precision and success rate. It is noteworthy that, on the VisDrone2018 benchmark, NT-Track achieved a precision rate of 90% for the first time, an accomplishment that not only showcases its exceptional performance in complex environments, but also confirms its potential and effectiveness in practical applications.
format Article
id doaj-art-463fdd4f1a3b49a08380883f2bf7740d
institution Kabale University
issn 2504-446X
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-463fdd4f1a3b49a08380883f2bf7740d2025-01-24T13:29:52ZengMDPI AGDrones2504-446X2025-01-01916810.3390/drones9010068Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle TrackingHao Zhang0Hengzhou Ye1Xiaoyu Guo2Xu Zhang3Yao Rong4Shuiwang Li5College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaState Environmental Protection Key Laboratory of Aquatic Ecosystem Health in the Middle and Lower Reaches of Yangtze River, Jiangsu Provincial Academy of Environmental Science, Nanjing 210036, ChinaYunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650504, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaDriven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time tracking tasks. Addressing the shortcomings of existing tracking systems in capturing temporal relationships between consecutive frames, NT-Track meticulously analyzes the positional changes in targets across frames and leverages the similarity of the surrounding areas to extract feature information. Furthermore, our method integrates spatial and temporal information seamlessly into a unified framework through the introduction of a temporal feature fusion technique, thereby bolstering the overall performance of the model. NT-Track also incorporates a spatial neighborhood feature extraction module, which focuses on identifying and extracting features within the neighborhood of the target in each frame, ensuring continuous focus on the target during inter-frame processing. By employing an improved Transformer backbone network, our approach effectively integrates spatio-temporal information, enhancing the accuracy and robustness of tracking. Our experimental results on several challenging benchmark datasets demonstrate that NT-Track surpasses existing lightweight and deep learning trackers in terms of precision and success rate. It is noteworthy that, on the VisDrone2018 benchmark, NT-Track achieved a precision rate of 90% for the first time, an accomplishment that not only showcases its exceptional performance in complex environments, but also confirms its potential and effectiveness in practical applications.https://www.mdpi.com/2504-446X/9/1/68UAV trackingtemporal relationshipsspatial neighborhood feature extractiontransformer networkreal-time tracking
spellingShingle Hao Zhang
Hengzhou Ye
Xiaoyu Guo
Xu Zhang
Yao Rong
Shuiwang Li
Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
Drones
UAV tracking
temporal relationships
spatial neighborhood feature extraction
transformer network
real-time tracking
title Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
title_full Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
title_fullStr Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
title_full_unstemmed Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
title_short Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
title_sort spatio temporal feature aware vision transformers for real time unmanned aerial vehicle tracking
topic UAV tracking
temporal relationships
spatial neighborhood feature extraction
transformer network
real-time tracking
url https://www.mdpi.com/2504-446X/9/1/68
work_keys_str_mv AT haozhang spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking
AT hengzhouye spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking
AT xiaoyuguo spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking
AT xuzhang spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking
AT yaorong spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking
AT shuiwangli spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking