HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification

Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly c...

Full description

Saved in:
Bibliographic Details
Main Authors: Haiyan Xu, Yanni Song, Gang Xu, Ke Wu, Jianguang Wen
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/25/12/3769
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850164555045404672
author Haiyan Xu
Yanni Song
Gang Xu
Ke Wu
Jianguang Wen
author_facet Haiyan Xu
Yanni Song
Gang Xu
Ke Wu
Jianguang Wen
author_sort Haiyan Xu
collection DOAJ
description Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly convey local information. To solve this problem, we propose a novel method based on High-Frequency Enhanced Vision Transformer and Multi-Layer Context Learning (HETMCL), which can effectively learn the comprehensive features of high-frequency and low-frequency information in visual data. First, Convolutional Neural Networks (CNNs) extract low-level spatial structures, and the Adjacent Layer Feature Fusion Module (AFFM) reduces semantic gaps between layers to enhance spatial context. Second, the High-Frequency Information Enhancement Vision Transformer (HFIE) includes a High-to-Low-Frequency Token Mixer (HLFTM), which captures high-frequency details. Finally, the Multi-Layer Context Alignment Attention (MCAA) integrates multi-layer features and contextual relationships. On UCM, AID, and NWPU datasets, HETMCL achieves state-of-the-art OA of 99.76%, 97.32%, and 95.02%, respectively, outperforming existing methods by up to 0.38%.
format Article
id doaj-art-995c580206004f93aef176adb17453f5
institution OA Journals
issn 1424-8220
language English
publishDate 2025-06-01
publisher MDPI AG
record_format Article
series Sensors
spelling doaj-art-995c580206004f93aef176adb17453f52025-08-20T02:21:58ZengMDPI AGSensors1424-82202025-06-012512376910.3390/s25123769HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene ClassificationHaiyan Xu0Yanni Song1Gang Xu2Ke Wu3Jianguang Wen4Zhejiang College of Security Technology, Wenzhou 325000, ChinaInstitute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, ChinaZhejiang College of Security Technology, Wenzhou 325000, ChinaInstitute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaRemote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly convey local information. To solve this problem, we propose a novel method based on High-Frequency Enhanced Vision Transformer and Multi-Layer Context Learning (HETMCL), which can effectively learn the comprehensive features of high-frequency and low-frequency information in visual data. First, Convolutional Neural Networks (CNNs) extract low-level spatial structures, and the Adjacent Layer Feature Fusion Module (AFFM) reduces semantic gaps between layers to enhance spatial context. Second, the High-Frequency Information Enhancement Vision Transformer (HFIE) includes a High-to-Low-Frequency Token Mixer (HLFTM), which captures high-frequency details. Finally, the Multi-Layer Context Alignment Attention (MCAA) integrates multi-layer features and contextual relationships. On UCM, AID, and NWPU datasets, HETMCL achieves state-of-the-art OA of 99.76%, 97.32%, and 95.02%, respectively, outperforming existing methods by up to 0.38%.https://www.mdpi.com/1424-8220/25/12/3769remote sensing scene classification (RSSC)convolutional neural network (CNN)transformer
spellingShingle Haiyan Xu
Yanni Song
Gang Xu
Ke Wu
Jianguang Wen
HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
Sensors
remote sensing scene classification (RSSC)
convolutional neural network (CNN)
transformer
title HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
title_full HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
title_fullStr HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
title_full_unstemmed HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
title_short HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
title_sort hetmcl high frequency enhancement transformer and multi layer context learning network for remote sensing scene classification
topic remote sensing scene classification (RSSC)
convolutional neural network (CNN)
transformer
url https://www.mdpi.com/1424-8220/25/12/3769
work_keys_str_mv AT haiyanxu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification
AT yannisong hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification
AT gangxu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification
AT kewu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification
AT jianguangwen hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification