A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution

Remote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall...

Full description

Saved in:
Bibliographic Details
Main Authors: Jie Wang, Hongwei Li, Yifan Li, Zilong Qin
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:ISPRS International Journal of Geo-Information
Subjects:
Online Access:https://www.mdpi.com/2220-9964/14/1/8
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588342113337344
author Jie Wang
Hongwei Li
Yifan Li
Zilong Qin
author_facet Jie Wang
Hongwei Li
Yifan Li
Zilong Qin
author_sort Jie Wang
collection DOAJ
description Remote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall short in terms of capturing global representations. While transformer methods are capable of learning long-distance dependencies, they often overlook local feature details, which can diminish the discriminability between the background and the foreground. Moreover, the distinctive architectures of transformers, their extensive parameter counts, and their reliance on large-scale training datasets impose constraints on transformer applications in remote sensing image feature extraction tasks. To address these challenges, this study introduces a novel hybrid CNN-Transformer network model named RepCHAT for remote sensing single image reconstruction, which incorporates a structural re-parameterization technique and a hybrid attention mechanism. This method leverages the strengths of transformers in terms of learning long-distance dependencies (global features) and CNNs with respect to extracting local features. The proposed approach achieves SR reconstruction for remote sensing images with fewer parameters and less computational overhead than those of traditional transformers and high-performance CNN models. We develop a multiscale feature extraction module that integrates both spatial- and frequency-domain features and employs structural re-parameterization theory to increase the inference efficiency of the model. Furthermore, we incorporate depthwise-separable convolution into the transformer block to bolster the local feature learning capabilities of the transformer. The method we propose achieves the optimal performance for remote sensing single-image super-resolution reconstruction and outperforms the competing methods by 0.28–1.05 dB (×4 scale) in terms of signal-to-noise ratio (PSNR). Experimental results indicate that the RepCHAT model proposed in this study maintains a high performance with significantly reduced complexity, making it suitable for deployment on edge devices.
format Article
id doaj-art-5384e68668b646f0b31870e0cce4881c
institution Kabale University
issn 2220-9964
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series ISPRS International Journal of Geo-Information
spelling doaj-art-5384e68668b646f0b31870e0cce4881c2025-01-24T13:34:57ZengMDPI AGISPRS International Journal of Geo-Information2220-99642024-12-01141810.3390/ijgi14010008A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-ResolutionJie Wang0Hongwei Li1Yifan Li2Zilong Qin3School of Geo-Science & Technology, Zhengzhou University, Zhengzhou 450052, ChinaSchool of Geo-Science & Technology, Zhengzhou University, Zhengzhou 450052, ChinaInstitute for Geophysics and Meteorology, University of Cologne, 50923 Cologne, GermanySchool of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, ChinaRemote sensing imagery contains rich information about geographical targets, and performing super-resolution (SR) reconstruction on such images requires greater feature representation capabilities. Convolutional neural network (CNN)-based methods excel at extracting intricate local features but fall short in terms of capturing global representations. While transformer methods are capable of learning long-distance dependencies, they often overlook local feature details, which can diminish the discriminability between the background and the foreground. Moreover, the distinctive architectures of transformers, their extensive parameter counts, and their reliance on large-scale training datasets impose constraints on transformer applications in remote sensing image feature extraction tasks. To address these challenges, this study introduces a novel hybrid CNN-Transformer network model named RepCHAT for remote sensing single image reconstruction, which incorporates a structural re-parameterization technique and a hybrid attention mechanism. This method leverages the strengths of transformers in terms of learning long-distance dependencies (global features) and CNNs with respect to extracting local features. The proposed approach achieves SR reconstruction for remote sensing images with fewer parameters and less computational overhead than those of traditional transformers and high-performance CNN models. We develop a multiscale feature extraction module that integrates both spatial- and frequency-domain features and employs structural re-parameterization theory to increase the inference efficiency of the model. Furthermore, we incorporate depthwise-separable convolution into the transformer block to bolster the local feature learning capabilities of the transformer. The method we propose achieves the optimal performance for remote sensing single-image super-resolution reconstruction and outperforms the competing methods by 0.28–1.05 dB (×4 scale) in terms of signal-to-noise ratio (PSNR). Experimental results indicate that the RepCHAT model proposed in this study maintains a high performance with significantly reduced complexity, making it suitable for deployment on edge devices.https://www.mdpi.com/2220-9964/14/1/8super-resolutionremote sensingCNN-Transformerlightweighthybrid attention
spellingShingle Jie Wang
Hongwei Li
Yifan Li
Zilong Qin
A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
ISPRS International Journal of Geo-Information
super-resolution
remote sensing
CNN-Transformer
lightweight
hybrid attention
title A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_full A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_fullStr A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_full_unstemmed A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_short A Lightweight CNN-Transformer Implemented via Structural Re-Parameterization and Hybrid Attention for Remote Sensing Image Super-Resolution
title_sort lightweight cnn transformer implemented via structural re parameterization and hybrid attention for remote sensing image super resolution
topic super-resolution
remote sensing
CNN-Transformer
lightweight
hybrid attention
url https://www.mdpi.com/2220-9964/14/1/8
work_keys_str_mv AT jiewang alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT hongweili alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT yifanli alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT zilongqin alightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT jiewang lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT hongweili lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT yifanli lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution
AT zilongqin lightweightcnntransformerimplementedviastructuralreparameterizationandhybridattentionforremotesensingimagesuperresolution