HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification
Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly c...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-06-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/12/3769 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850164555045404672 |
|---|---|
| author | Haiyan Xu Yanni Song Gang Xu Ke Wu Jianguang Wen |
| author_facet | Haiyan Xu Yanni Song Gang Xu Ke Wu Jianguang Wen |
| author_sort | Haiyan Xu |
| collection | DOAJ |
| description | Remote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly convey local information. To solve this problem, we propose a novel method based on High-Frequency Enhanced Vision Transformer and Multi-Layer Context Learning (HETMCL), which can effectively learn the comprehensive features of high-frequency and low-frequency information in visual data. First, Convolutional Neural Networks (CNNs) extract low-level spatial structures, and the Adjacent Layer Feature Fusion Module (AFFM) reduces semantic gaps between layers to enhance spatial context. Second, the High-Frequency Information Enhancement Vision Transformer (HFIE) includes a High-to-Low-Frequency Token Mixer (HLFTM), which captures high-frequency details. Finally, the Multi-Layer Context Alignment Attention (MCAA) integrates multi-layer features and contextual relationships. On UCM, AID, and NWPU datasets, HETMCL achieves state-of-the-art OA of 99.76%, 97.32%, and 95.02%, respectively, outperforming existing methods by up to 0.38%. |
| format | Article |
| id | doaj-art-995c580206004f93aef176adb17453f5 |
| institution | OA Journals |
| issn | 1424-8220 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Sensors |
| spelling | doaj-art-995c580206004f93aef176adb17453f52025-08-20T02:21:58ZengMDPI AGSensors1424-82202025-06-012512376910.3390/s25123769HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene ClassificationHaiyan Xu0Yanni Song1Gang Xu2Ke Wu3Jianguang Wen4Zhejiang College of Security Technology, Wenzhou 325000, ChinaInstitute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, ChinaZhejiang College of Security Technology, Wenzhou 325000, ChinaInstitute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, ChinaState Key Laboratory of Remote Sensing Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100101, ChinaRemote Sensing Scene Classification (RSSC) is an important and challenging research topic. Transformer-based methods have shown encouraging performance in capturing global dependencies. However, recent studies have revealed that Transformers perform poorly in capturing high frequencies that mainly convey local information. To solve this problem, we propose a novel method based on High-Frequency Enhanced Vision Transformer and Multi-Layer Context Learning (HETMCL), which can effectively learn the comprehensive features of high-frequency and low-frequency information in visual data. First, Convolutional Neural Networks (CNNs) extract low-level spatial structures, and the Adjacent Layer Feature Fusion Module (AFFM) reduces semantic gaps between layers to enhance spatial context. Second, the High-Frequency Information Enhancement Vision Transformer (HFIE) includes a High-to-Low-Frequency Token Mixer (HLFTM), which captures high-frequency details. Finally, the Multi-Layer Context Alignment Attention (MCAA) integrates multi-layer features and contextual relationships. On UCM, AID, and NWPU datasets, HETMCL achieves state-of-the-art OA of 99.76%, 97.32%, and 95.02%, respectively, outperforming existing methods by up to 0.38%.https://www.mdpi.com/1424-8220/25/12/3769remote sensing scene classification (RSSC)convolutional neural network (CNN)transformer |
| spellingShingle | Haiyan Xu Yanni Song Gang Xu Ke Wu Jianguang Wen HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification Sensors remote sensing scene classification (RSSC) convolutional neural network (CNN) transformer |
| title | HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification |
| title_full | HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification |
| title_fullStr | HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification |
| title_full_unstemmed | HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification |
| title_short | HETMCL: High-Frequency Enhancement Transformer and Multi-Layer Context Learning Network for Remote Sensing Scene Classification |
| title_sort | hetmcl high frequency enhancement transformer and multi layer context learning network for remote sensing scene classification |
| topic | remote sensing scene classification (RSSC) convolutional neural network (CNN) transformer |
| url | https://www.mdpi.com/1424-8220/25/12/3769 |
| work_keys_str_mv | AT haiyanxu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification AT yannisong hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification AT gangxu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification AT kewu hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification AT jianguangwen hetmclhighfrequencyenhancementtransformerandmultilayercontextlearningnetworkforremotesensingsceneclassification |