Multistage Training and Fusion Method for Imbalanced Multimodal UAV Remote Sensing Classification
In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiv...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11071999/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In remote sensing applications, autonomous aerial vehicles (AAVs) overcome the limitations of single-sensor approaches by integrating multiple sensors and fusing cross-modal data, significantly improving target classification accuracy. However, during the process of multimodal learning, the effectiveness of fusion is severely affected by modality imbalance caused by inconsistent gradient directions when integrating heterogeneous information. Existing methods predominantly focus on parameter tuning and gradient modulation, failing to resolve inherent conflicts from divergent modality optimization trajectories. To address these limitations, we propose a gradient-criterion multistage training (GCMT) framework, which systematically resolves gradient conflicts through an alternating freezing strategy, optimizing unimodal branches by evaluating consistency between unimodal and multimodal gradient directions. Building on the GCMT, we further introduce an information entropy measurement fusion (IEMF) module, which dynamically adjusts cross-modal feature fusion weights using entropy-based metrics to mitigate overreliance on dominant modalities while preserving synergistic interactions. We build a multimodal dataset of signals and images based on the UAV platform, and extensive experiments are implemented on both our self-constructed and public datasets. The results not only demonstrate a significant improvement in the performance of our GCMT compared to state-of-the-art methods, but also validate the efficacy of GCMT in harmonizing gradient alignment and of IEMF in enabling balanced multimodal fusion. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |