Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification

With the uninterrupted evolution of remote sensing data, the list of available data sources has expanded, effectively utilizing useful information from multiple sources for better land surface observation, which has become an intriguing and challenging problem. However, the complexity of urban areas...

Full description

Saved in:
Bibliographic Details
Main Authors: Aili Wang, Guilong Lei, Shiyu Dai, Haibin Wu, Yuji Iwahori
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10818716/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576763940569088
author Aili Wang
Guilong Lei
Shiyu Dai
Haibin Wu
Yuji Iwahori
author_facet Aili Wang
Guilong Lei
Shiyu Dai
Haibin Wu
Yuji Iwahori
author_sort Aili Wang
collection DOAJ
description With the uninterrupted evolution of remote sensing data, the list of available data sources has expanded, effectively utilizing useful information from multiple sources for better land surface observation, which has become an intriguing and challenging problem. However, the complexity of urban areas and their surrounding structures makes it extremely difficult to capture correlations between features. This article proposes a novel multiscale attention feature fusion network, composed of hierarchical convolutional neural networks and transformer to enhance joint classification accuracy of hyperspectral image (HSI) and light detection and ranging (LiDAR) data. First, a multiscale fusion Swin transformer module is employed to eliminate information loss in feature propagation, which explores deep spatial–spectral features of HSI while extracting height information from LiDAR data. This structure combines the advantages of the Swin transformer, featuring a nonlocal receptive field fusion by progressively expanding the window's receptive field layer by layer while preserving the spatial features of the image. It also exhibits excellent robustness against spatial misalignment. For the dual branches of hyperspectral and LiDAR, a dual-source feature interactor is designed, which facilitates interaction between hyperspectral and LiDAR features by establishing a dynamic attention mechanism, which effectively captures correlated information between the two modalities and fuses it into a unified feature representation. The efficacy of the proposed approach is validated using three standard datasets (Huston2013, Trento, and MUUFL) in the experiments. The classification results indicate that the proposed framework, by fully utilizing spatial context information and effectively integrating feature information, significantly outperforms state-of-the-art classification methods.
format Article
id doaj-art-73f05db7bf014c2cb58dbcdff242e783
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-73f05db7bf014c2cb58dbcdff242e7832025-01-31T00:00:18ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01184124414010.1109/JSTARS.2024.352444310818716Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data ClassificationAili Wang0https://orcid.org/0000-0002-9118-230XGuilong Lei1Shiyu Dai2Haibin Wu3https://orcid.org/0000-0002-2453-3691Yuji Iwahori4https://orcid.org/0000-0002-6421-8186Heilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin, ChinaHeilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin, ChinaHeilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin, ChinaHeilongjiang Province Key Laboratory of Laser Spectroscopy Technology and Application, College of Measurement and Control Technology and Communication Engineering, Harbin University of Science and Technology, Harbin, ChinaDepartment of Computer Science, Chubu University, Kasugai, JapanWith the uninterrupted evolution of remote sensing data, the list of available data sources has expanded, effectively utilizing useful information from multiple sources for better land surface observation, which has become an intriguing and challenging problem. However, the complexity of urban areas and their surrounding structures makes it extremely difficult to capture correlations between features. This article proposes a novel multiscale attention feature fusion network, composed of hierarchical convolutional neural networks and transformer to enhance joint classification accuracy of hyperspectral image (HSI) and light detection and ranging (LiDAR) data. First, a multiscale fusion Swin transformer module is employed to eliminate information loss in feature propagation, which explores deep spatial–spectral features of HSI while extracting height information from LiDAR data. This structure combines the advantages of the Swin transformer, featuring a nonlocal receptive field fusion by progressively expanding the window's receptive field layer by layer while preserving the spatial features of the image. It also exhibits excellent robustness against spatial misalignment. For the dual branches of hyperspectral and LiDAR, a dual-source feature interactor is designed, which facilitates interaction between hyperspectral and LiDAR features by establishing a dynamic attention mechanism, which effectively captures correlated information between the two modalities and fuses it into a unified feature representation. The efficacy of the proposed approach is validated using three standard datasets (Huston2013, Trento, and MUUFL) in the experiments. The classification results indicate that the proposed framework, by fully utilizing spatial context information and effectively integrating feature information, significantly outperforms state-of-the-art classification methods.https://ieeexplore.ieee.org/document/10818716/Hyperspectral image (HSI)interaction transformerlight detection and ranging (LiDAR)multisource data classificationthree-dimensional convolutional neural network (3D-CNN)
spellingShingle Aili Wang
Guilong Lei
Shiyu Dai
Haibin Wu
Yuji Iwahori
Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Hyperspectral image (HSI)
interaction transformer
light detection and ranging (LiDAR)
multisource data classification
three-dimensional convolutional neural network (3D-CNN)
title Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
title_full Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
title_fullStr Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
title_full_unstemmed Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
title_short Multiscale Attention Feature Fusion Based on Improved Transformer for Hyperspectral Image and LiDAR Data Classification
title_sort multiscale attention feature fusion based on improved transformer for hyperspectral image and lidar data classification
topic Hyperspectral image (HSI)
interaction transformer
light detection and ranging (LiDAR)
multisource data classification
three-dimensional convolutional neural network (3D-CNN)
url https://ieeexplore.ieee.org/document/10818716/
work_keys_str_mv AT ailiwang multiscaleattentionfeaturefusionbasedonimprovedtransformerforhyperspectralimageandlidardataclassification
AT guilonglei multiscaleattentionfeaturefusionbasedonimprovedtransformerforhyperspectralimageandlidardataclassification
AT shiyudai multiscaleattentionfeaturefusionbasedonimprovedtransformerforhyperspectralimageandlidardataclassification
AT haibinwu multiscaleattentionfeaturefusionbasedonimprovedtransformerforhyperspectralimageandlidardataclassification
AT yujiiwahori multiscaleattentionfeaturefusionbasedonimprovedtransformerforhyperspectralimageandlidardataclassification