GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping

Abstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Exi...

Full description

Saved in:
Bibliographic Details
Main Authors: Chenglong Zhou, Damin Zhang, Qing He, MingFang Li, MingRong Li, Xiaobo Zhou
Format: Article
Language:English
Published: Springer 2025-07-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:https://doi.org/10.1007/s44443-025-00120-4
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849342104945295360
author Chenglong Zhou
Damin Zhang
Qing He
MingFang Li
MingRong Li
Xiaobo Zhou
author_facet Chenglong Zhou
Damin Zhang
Qing He
MingFang Li
MingRong Li
Xiaobo Zhou
author_sort Chenglong Zhou
collection DOAJ
description Abstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Existing methods mainly focus on learning prominent visual patterns, often neglecting other potential features, which makes it difficult for the model to fully distinguish subtle differences in both global and local features of objects, thus limiting the performance of FGIR tasks. This work proposes a Global–Local Enhanced Module (GLEM) to integrate global and local features to address these issues effectively. GLEM is based on channel-aware attention mechanisms and explores new feature details through adaptive erasure and dynamic fusion strategies, preventing the model from overly focusing on prominent regions. At the same time, GLEM utilizes multi-view cropping techniques to capture subtle differences between global and local features effectively. We conduct extensive experiments on three FGIR benchmark datasets, and the results demonstrate that the proposed GLEM method achieves state-of-the-art performance.
format Article
id doaj-art-b2c7d153d8454d9d8108ca33f457bed3
institution Kabale University
issn 1319-1578
2213-1248
language English
publishDate 2025-07-01
publisher Springer
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj-art-b2c7d153d8454d9d8108ca33f457bed32025-08-20T03:43:30ZengSpringerJournal of King Saud University: Computer and Information Sciences1319-15782213-12482025-07-0137511310.1007/s44443-025-00120-4GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view croppingChenglong Zhou0Damin Zhang1Qing He2MingFang Li3MingRong Li4Xiaobo Zhou5College of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityCollege of Big Data and Information Engineering, Guizhou UniversityAbstract Fine-grained image recognition (FGIR) aims to distinguish between visual objects and their subcategories with subtle differences. Due to the highly similar features between categories in fine-grained image recognition tasks, the model requires more substantial discriminative capability. Existing methods mainly focus on learning prominent visual patterns, often neglecting other potential features, which makes it difficult for the model to fully distinguish subtle differences in both global and local features of objects, thus limiting the performance of FGIR tasks. This work proposes a Global–Local Enhanced Module (GLEM) to integrate global and local features to address these issues effectively. GLEM is based on channel-aware attention mechanisms and explores new feature details through adaptive erasure and dynamic fusion strategies, preventing the model from overly focusing on prominent regions. At the same time, GLEM utilizes multi-view cropping techniques to capture subtle differences between global and local features effectively. We conduct extensive experiments on three FGIR benchmark datasets, and the results demonstrate that the proposed GLEM method achieves state-of-the-art performance.https://doi.org/10.1007/s44443-025-00120-4Fine-grained image recognitionChannel-aware attentionAdaptive erasureDynamic fusionMulti-view cropping
spellingShingle Chenglong Zhou
Damin Zhang
Qing He
MingFang Li
MingRong Li
Xiaobo Zhou
GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
Journal of King Saud University: Computer and Information Sciences
Fine-grained image recognition
Channel-aware attention
Adaptive erasure
Dynamic fusion
Multi-view cropping
title GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
title_full GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
title_fullStr GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
title_full_unstemmed GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
title_short GLEM: a global–local enhancement method for fine-grained image recognition with attention erasure and multi-view cropping
title_sort glem a global local enhancement method for fine grained image recognition with attention erasure and multi view cropping
topic Fine-grained image recognition
Channel-aware attention
Adaptive erasure
Dynamic fusion
Multi-view cropping
url https://doi.org/10.1007/s44443-025-00120-4
work_keys_str_mv AT chenglongzhou glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping
AT daminzhang glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping
AT qinghe glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping
AT mingfangli glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping
AT mingrongli glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping
AT xiaobozhou glemagloballocalenhancementmethodforfinegrainedimagerecognitionwithattentionerasureandmultiviewcropping