Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking
Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Drones |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-446X/9/1/68 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588618266312704 |
---|---|
author | Hao Zhang Hengzhou Ye Xiaoyu Guo Xu Zhang Yao Rong Shuiwang Li |
author_facet | Hao Zhang Hengzhou Ye Xiaoyu Guo Xu Zhang Yao Rong Shuiwang Li |
author_sort | Hao Zhang |
collection | DOAJ |
description | Driven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time tracking tasks. Addressing the shortcomings of existing tracking systems in capturing temporal relationships between consecutive frames, NT-Track meticulously analyzes the positional changes in targets across frames and leverages the similarity of the surrounding areas to extract feature information. Furthermore, our method integrates spatial and temporal information seamlessly into a unified framework through the introduction of a temporal feature fusion technique, thereby bolstering the overall performance of the model. NT-Track also incorporates a spatial neighborhood feature extraction module, which focuses on identifying and extracting features within the neighborhood of the target in each frame, ensuring continuous focus on the target during inter-frame processing. By employing an improved Transformer backbone network, our approach effectively integrates spatio-temporal information, enhancing the accuracy and robustness of tracking. Our experimental results on several challenging benchmark datasets demonstrate that NT-Track surpasses existing lightweight and deep learning trackers in terms of precision and success rate. It is noteworthy that, on the VisDrone2018 benchmark, NT-Track achieved a precision rate of 90% for the first time, an accomplishment that not only showcases its exceptional performance in complex environments, but also confirms its potential and effectiveness in practical applications. |
format | Article |
id | doaj-art-463fdd4f1a3b49a08380883f2bf7740d |
institution | Kabale University |
issn | 2504-446X |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Drones |
spelling | doaj-art-463fdd4f1a3b49a08380883f2bf7740d2025-01-24T13:29:52ZengMDPI AGDrones2504-446X2025-01-01916810.3390/drones9010068Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle TrackingHao Zhang0Hengzhou Ye1Xiaoyu Guo2Xu Zhang3Yao Rong4Shuiwang Li5College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaState Environmental Protection Key Laboratory of Aquatic Ecosystem Health in the Middle and Lower Reaches of Yangtze River, Jiangsu Provincial Academy of Environmental Science, Nanjing 210036, ChinaYunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650504, ChinaCollege of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, ChinaDriven by the rapid advancement of Unmanned Aerial Vehicle (UAV) technology, the field of UAV object tracking has witnessed significant progress. This study introduces an innovative single-stream UAV tracking architecture, dubbed NT-Track, which is dedicated to enhancing the efficiency and accuracy of real-time tracking tasks. Addressing the shortcomings of existing tracking systems in capturing temporal relationships between consecutive frames, NT-Track meticulously analyzes the positional changes in targets across frames and leverages the similarity of the surrounding areas to extract feature information. Furthermore, our method integrates spatial and temporal information seamlessly into a unified framework through the introduction of a temporal feature fusion technique, thereby bolstering the overall performance of the model. NT-Track also incorporates a spatial neighborhood feature extraction module, which focuses on identifying and extracting features within the neighborhood of the target in each frame, ensuring continuous focus on the target during inter-frame processing. By employing an improved Transformer backbone network, our approach effectively integrates spatio-temporal information, enhancing the accuracy and robustness of tracking. Our experimental results on several challenging benchmark datasets demonstrate that NT-Track surpasses existing lightweight and deep learning trackers in terms of precision and success rate. It is noteworthy that, on the VisDrone2018 benchmark, NT-Track achieved a precision rate of 90% for the first time, an accomplishment that not only showcases its exceptional performance in complex environments, but also confirms its potential and effectiveness in practical applications.https://www.mdpi.com/2504-446X/9/1/68UAV trackingtemporal relationshipsspatial neighborhood feature extractiontransformer networkreal-time tracking |
spellingShingle | Hao Zhang Hengzhou Ye Xiaoyu Guo Xu Zhang Yao Rong Shuiwang Li Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking Drones UAV tracking temporal relationships spatial neighborhood feature extraction transformer network real-time tracking |
title | Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking |
title_full | Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking |
title_fullStr | Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking |
title_full_unstemmed | Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking |
title_short | Spatio-Temporal Feature Aware Vision Transformers for Real-Time Unmanned Aerial Vehicle Tracking |
title_sort | spatio temporal feature aware vision transformers for real time unmanned aerial vehicle tracking |
topic | UAV tracking temporal relationships spatial neighborhood feature extraction transformer network real-time tracking |
url | https://www.mdpi.com/2504-446X/9/1/68 |
work_keys_str_mv | AT haozhang spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking AT hengzhouye spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking AT xiaoyuguo spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking AT xuzhang spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking AT yaorong spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking AT shuiwangli spatiotemporalfeatureawarevisiontransformersforrealtimeunmannedaerialvehicletracking |