BinaryViT: Binary Vision Transformer for Hyperspectral Image Classification

Vision transformers have demonstrated remarkable performance in hyperspectral image classification tasks. However, their complex computational mechanisms and excessive parameterization severely restrict deployment on resource-constrained platforms, such as FPGAs and embedded CPUs. As a key technolog...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiang Hu, Taolin Liu, Zhe Guo, Yuxiang Tang, Yuanxi Peng, Tong Zhou
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Binary neural network hyperspectral image (HSI) classification vision transformer (ViT)
Online Access:	https://ieeexplore.ieee.org/document/11072278/
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Vision transformers have demonstrated remarkable performance in hyperspectral image classification tasks. However, their complex computational mechanisms and excessive parameterization severely restrict deployment on resource-constrained platforms, such as FPGAs and embedded CPUs. As a key technology for lightweight deep models, binary quantization achieves significant parameter compression and computational acceleration by binarizing activations and weights. However, binary quantization in transformers faces challenges such as degradation of feature representation capability after binarizing self-attention mechanisms and decline in fusion efficiency of multiscale spectral–spatial information, leading to relatively lagging progress in this field. To address these issues, this study proposes a novel binary vision transformer architecture tailored for hyperspectral image classification. Built upon traditional Transformers, the approach innovatively introduces a self-adaptive softmax binarization module, which dynamically adjusts the binarization threshold distribution to effectively mitigate discretization errors in gradient propagation during the binarization process. Meanwhile, a multibranch average pooling block is designed to enable hierarchical aggregation of features across different spectral dimensions, significantly enhancing the model’s ability to represent complex spectral–spatial correlations in hyperspectral data. The proposed method achieves over 99% parameter binarization and reduces floating-point computation by more than 89% compared to full-precision counterparts, while maintaining competitive classification accuracy. Experiments conducted on seven benchmark hyperspectral datasets demonstrate the effectiveness of our approach in balancing computational efficiency and classification performance.
ISSN:	1939-1404 2151-1535

BinaryViT: Binary Vision Transformer for Hyperspectral Image Classification

Similar Items