A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution

Remote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jie Wang, Hongwei Li, Yifan Li, Zilong Qin
Format:	Article
Language:	English
Published:	MDPI AG 2024-12-01
Series:	ISPRS International Journal of Geo-Information
Subjects:	super-resolution remote sensing CNN-Transformer lightweight hybrid attention
Online Access:	https://www.mdpi.com/2220-9964/14/1/8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832588342113337344
author	Jie Wang Hongwei Li Yifan Li Zilong Qin
author_facet	Jie Wang Hongwei Li Yifan Li Zilong Qin
author_sort	Jie Wang
collection	DOAJ
description	Remote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall short in terms of capturing global representations. While transformer methods are capable of learning long-distance dependencies, they often overlook local feature details, which can diminish the discriminability between the background and the foreground. Moreover, the distinctive architectures of transformers, their extensive parameter counts, and their reliance on large-scale training datasets impose constraints on transformer applications in remote sensing image feature extraction tasks. To address these challenges, this study introduces a novel hybrid CNN-Transformer network model named RepCHAT for remote sensing single image reconstruction, which incorporates a structural re-parameterization technique and a hybrid attention mechanism. This method leverages the strengths of transformers in terms of learning long-distance dependencies (global features) and CNNs with respect to extracting local features. The proposed approach achieves SR reconstruction for remote sensing images with fewer parameters and less computational overhead than those of traditional transformers and high-performance CNN models. We develop a multiscale feature extraction module that integrates both spatial- and frequency-domain features and employs structural re-parameterization theory to increase the inference efficiency of the model. Furthermore, we incorporate depthwise-separable convolution into the transformer block to bolster the local feature learning capabilities of the transformer. The method we propose achieves the optimal performance for remote sensing single-image super-resolution reconstruction and outperforms the competing methods by 0.28–1.05 dB (×4 scale) in terms of signal-to-noise ratio (PSNR). Experimental results indicate that the RepCHAT model proposed in this study maintains a high performance with significantly reduced complexity, making it suitable for deployment on edge devices.
format	Article
id	doaj-art-5384e68668b646f0b31870e0cce4881c
institution	Kabale University
issn	2220-9964
language	English
publishDate	2024-12-01
publisher	MDPI AG
record_format	Article
series	ISPRS International Journal of Geo-Information
spelling	doaj-art-5384e68668b646f0b31870e0cce4881c2025-01-24T13:34:57ZengMDPI AGISPRS International Journal of Geo-Information2220-99642024-12-01141810.3390/ijgi14010008A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-ResolutionJie Wang0Hongwei Li1Yifan Li2Zilong Qin3School of Geo-Science & Technology, Zhengzhou University, Zhengzhou 450052, ChinaSchool of Geo-Science & Technology, Zhengzhou University, Zhengzhou 450052, ChinaInstitute for Geophysics and Meteorology, University of Cologne, 50923 Cologne, GermanySchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, ChinaRemote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall short in terms of capturing global representations. While transformer methods are capable of learning long-distance dependencies, they often overlook local feature details, which can diminish the discriminability between the background and the foreground. Moreover, the distinctive architectures of transformers, their extensive parameter counts, and their reliance on large-scale training datasets impose constraints on transformer applications in remote sensing image feature extraction tasks. To address these challenges, this study introduces a novel hybrid CNN-Transformer network model named RepCHAT for remote sensing single image reconstruction, which incorporates a structural re-parameterization technique and a hybrid attention mechanism. This method leverages the strengths of transformers in terms of learning long-distance dependencies (global features) and CNNs with respect to extracting local features. The proposed approach achieves SR reconstruction for remote sensing images with fewer parameters and less computational overhead than those of traditional transformers and high-performance CNN models. We develop a multiscale feature extraction module that integrates both spatial- and frequency-domain features and employs structural re-parameterization theory to increase the inference efficiency of the model. Furthermore, we incorporate depthwise-separable convolution into the transformer block to bolster the local feature learning capabilities of the transformer. The method we propose achieves the optimal performance for remote sensing single-image super-resolution reconstruction and outperforms the competing methods by 0.28–1.05 dB (×4 scale) in terms of signal-to-noise ratio (PSNR). Experimental results indicate that the RepCHAT model proposed in this study maintains a high performance with significantly reduced complexity, making it suitable for deployment on edge devices.https://www.mdpi.com/2220-9964/14/1/8super-resolutionremote sensingCNN-Transformerlightweighthybrid attention
spellingShingle	Jie Wang Hongwei Li Yifan Li Zilong Qin A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution ISPRS International Journal of Geo-Information super-resolution remote sensing CNN-Transformer lightweight hybrid attention
title	A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_full	A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_fullStr	A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_full_unstemmed	A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_short	A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_sort	lightweight cnn transformer implemented via structural re parameterization and hybrid attention for remote sensing image super resolution
topic	super-resolution remote sensing CNN-Transformer lightweight hybrid attention
url	https://www.mdpi.com/2220-9964/14/1/8
work_keys_str_mv	AT jiewang alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT hongweili alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT yifanli alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT zilongqin alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT jiewang lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT hongweili lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT yifanli lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution AT zilongqin lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution

A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution

Similar Items