Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification
Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral infor...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11006409/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks. |
|---|---|
| ISSN: | 1939-1404 2151-1535 |