Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification

In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Shihao Wang, Zhengwei Xu, Yun Lin
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11071999/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiveness of fusion is severely affected by modality imbalance caused by inconsistent gradient directions when integrating heterogeneous information. Existing methods predominantly focus on parameter tuning and gradient modulation, failing to resolve inherent conflicts from divergent modality optimization trajectories. To address these limitations, we propose a gradient-criterion multistage training (GCMT) framework, which systematically resolves gradient conflicts through an alternating freezing strategy, optimizing unimodal branches by evaluating consistency between unimodal and multimodal gradient directions. Building on the GCMT, we further introduce an information entropy measurement fusion (IEMF) module, which dynamically adjusts cross-modal feature fusion weights using entropy-based metrics to mitigate overreliance on dominant modalities while preserving synergistic interactions. We build a multimodal dataset of signals and images based on the UAV platform, and extensive experiments are implemented on both our self-constructed and public datasets. The results not only demonstrate a significant improvement in the performance of our GCMT compared to state-of-the-art methods, but also validate the efficacy of GCMT in harmonizing gradient alignment and of IEMF in enabling balanced multimodal fusion.
ISSN:1939-1404
2151-1535