An Efficient Pyramid Transformer Network for Cross-View Geo-Localization in Complex Terrains

Unmanned aerial vehicle (UAV) self-localization in complex environments is critical when global navigation satellite systems (GNSSs) are unreliable. Existing datasets, often limited to low-altitude urban scenes, hinder generalization. This study introduces Multi-UAV, a novel dataset with 17.4 k high...

Full description

Saved in:
Bibliographic Details
Main Authors: Chengjie Ju, Wangping Xu, Nanxing Chen, Enhui Zheng
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/5/379
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Unmanned aerial vehicle (UAV) self-localization in complex environments is critical when global navigation satellite systems (GNSSs) are unreliable. Existing datasets, often limited to low-altitude urban scenes, hinder generalization. This study introduces Multi-UAV, a novel dataset with 17.4 k high-resolution UAV–satellite image pairs from diverse terrains (urban, rural, mountainous, farmland, coastal) and altitudes across China, enhancing cross-view geolocalization research. We propose a lightweight value reduction pyramid transformer (VRPT) for efficient feature extraction and a residual feature pyramid network (RFPN) for multi-scale feature fusion. Using meter-level accuracy (MA@K) and relative distance score (RDS), VRPT achieves robust, high-precision localization across varied terrains, offering significant potential for resource-constrained UAV deployment.
ISSN:2504-446X