GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
Abstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Exi...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-07-01
|
| Series: | Journal of King Saud University: Computer and Information Sciences |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s44443-025-00120-4 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849342104945295360 |
|---|---|
| author | Chenglong Zhou Damin Zhang Qing He MingFang Li MingRong Li Xiaobo Zhou |
| author_facet | Chenglong Zhou Damin Zhang Qing He MingFang Li MingRong Li Xiaobo Zhou |
| author_sort | Chenglong Zhou |
| collection | DOAJ |
| description | Abstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Existing methods mainly focus on learning prominent visual patterns, often neglecting other potential features, which makes it difficult for the model to fully distinguish subtle differences in both global and local features of objects, thus limiting the performance of FGIR tasks. This work proposes a Global–Local Enhanced Module (GLEM) to integrate global and local features to address these issues effectively. GLEM is based on channel-aware attention mechanisms and explores new feature details through adaptive erasure and dynamic fusion strategies, preventing the model from overly focusing on prominent regions. At the same time, GLEM utilizes multi-view cropping techniques to capture subtle differences between global and local features effectively. We conduct extensive experiments on three FGIR benchmark datasets, and the results demonstrate that the proposed GLEM method achieves state-of-the-art performance. |
| format | Article |
| id | doaj-art-b2c7d153d8454d9d8108ca33f457bed3 |
| institution | Kabale University |
| issn | 1319-1578 2213-1248 |
| language | English |
| publishDate | 2025-07-01 |
| publisher | Springer |
| record_format | Article |
| series | Journal of King Saud University: Computer and Information Sciences |
| spelling | doaj-art-b2c7d153d8454d9d8108ca33f457bed32025-08-20T03:43:30ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-07-0137511310.1007/s44443-025-00120-4GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view croppingChenglong Zhou0Damin Zhang1Qing He2MingFang Li3MingRong Li4Xiaobo Zhou5College of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityAbstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Existing methods mainly focus on learning prominent visual patterns, often neglecting other potential features, which makes it difficult for the model to fully distinguish subtle differences in both global and local features of objects, thus limiting the performance of FGIR tasks. This work proposes a Global–Local Enhanced Module (GLEM) to integrate global and local features to address these issues effectively. GLEM is based on channel-aware attention mechanisms and explores new feature details through adaptive erasure and dynamic fusion strategies, preventing the model from overly focusing on prominent regions. At the same time, GLEM utilizes multi-view cropping techniques to capture subtle differences between global and local features effectively. We conduct extensive experiments on three FGIR benchmark datasets, and the results demonstrate that the proposed GLEM method achieves state-of-the-art performance.https://doi.org/10.1007/s44443-025-00120-4Fine-grained image recognitionChannel-aware attentionAdaptive erasureDynamic fusionMulti-view cropping |
| spellingShingle | Chenglong Zhou Damin Zhang Qing He MingFang Li MingRong Li Xiaobo Zhou GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping Journal of King Saud University: Computer and Information Sciences Fine-grained image recognition Channel-aware attention Adaptive erasure Dynamic fusion Multi-view cropping |
| title | GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping |
| title_full | GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping |
| title_fullStr | GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping |
| title_full_unstemmed | GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping |
| title_short | GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping |
| title_sort | glem a global local enhancement method for fine grained image recognition with attention erasure and multi view cropping |
| topic | Fine-grained image recognition Channel-aware attention Adaptive erasure Dynamic fusion Multi-view cropping |
| url | https://doi.org/10.1007/s44443-025-00120-4 |
| work_keys_str_mv | AT chenglongzhou glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping AT daminzhang glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping AT qinghe glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping AT mingfangli glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping AT mingrongli glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping AT xiaobozhou glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping |