Semantic Tokenization-Based Mamba for Hyperspectral Image Classification

Recently, the transformer-based model has shown superior performance in hyperspectral image classification (HSIC) due to its excellent ability to model long-term dependencies on sequence data. An important component of the transformer is the tokenizer, which can transform the features into semantic...

Full description

Saved in:
Bibliographic Details
Main Authors: Ri Ming, Na Chen, Jiangtao Peng, Weiwei Sun, Zhijing Ye
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10838328/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, the transformer-based model has shown superior performance in hyperspectral image classification (HSIC) due to its excellent ability to model long-term dependencies on sequence data. An important component of the transformer is the tokenizer, which can transform the features into semantic token sequences (STS). Nonetheless, transformer's semantic tokenization strategy is hardly representative of local relatively important high-level semantics because of its global receptive field. Recently, the Mamba-based methods have shown even stronger spatial context modeling ability than Transformer for HSIC. However, these Mamba-based methods mainly focus on spectral and spatial dimensions. They tend to extract semantic information in very long feature sequences or represent semantic information in several typical tokens, which may ignore some important semantics of the HSIs. In order to represent the semantic information of HSIs more holistically in Mamba, this article proposes a semantic tokenization-based Mamba (STMamba) model. In STMamba, a spectral-spatial feature extraction module is used to extract the spectral–spatial joint features. Then, a generated semantic token sequences module is designed to transform the features into STS. Subsequently, the STS are fed into the semantic token state spatial model to capture relationships between different semantic tokens. Finally, the fused semantic token is passed into a classifier for classification. Experimental results on three HSI datasets demonstrate that the proposed STMamba outperforms existing state-of-the-art deep learning and transformer-based methods.
ISSN:1939-1404
2151-1535