Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification

Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral infor...

Full description

Saved in:

Bibliographic Details
Main Authors:	Enyu Zhao, Yongfang Su, Nianxin Qu, Yulei Wang, Caixia Gao, Jian Zeng
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Convolutional neural network (CNN) image classification thermal infrared hyperspectral image (TI-HSI) transformer visible hyperspectral image (V-HSI)
Online Access:	https://ieeexplore.ieee.org/document/11006409/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Visible hyperspectral image (V-HSI) and thermal infrared hyperspectral image (TI-HSI) have been crucial data sources for land cover classification. V-HSI can directly provide information of land surface, such as shape, color, texture, and other features. TI-HSI contains rich long-wave spectral information, which can reflect the unique emission characteristics of ground objects in the thermal infrared spectral range. To fully leverage the advantages of V-HSI and TI-HSI while enhancing the classification accuracy, this article proposes a self- and cross-attention enhanced transformer network (SCAET), integrated with convolutional neural network (CNN) for HSI classification. Initially, the proposed method employs a dual-branch spatial-spectral CNN (SS CNN) to extract spectral convolution features from V-HSI and TI-HSI, respectively. Subsequently, a spectral feature mapping (SFM) module is proposed to perform feature transformation, extracting independent and interactive features of V-HSI and TI-HSI. Then, a self- and cross-attention interactive enhancement module is designed to extract deeper features and enhance the independent features by using the interactive features. In addition, a self-projection mixing module is formulated to promote feature interaction and improve the generalization capability of the model. To validate the effectiveness of the proposed network, extensive experiments are conducted on real-world datasets, and the results indicate that SCAET significantly outperforms current multisource fusion networks.
ISSN:	1939-1404 2151-1535

Self- and Cross-Attention Enhanced Transformer for Visible and Thermal Infrared Hyperspectral Image Classification

Similar Items