Research on Fine-Grained Visual Classification Method Based on Dual-Attention Feature Complementation

Fine-grained image classification is a notable challenge in the field of computer vision. The primary influencing factor is that similar images often have different labels, meaning there is high inter-class similarity and low intra-class similarity. An increasing number of fine-grained classificatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Min Huang, Ke Li, Xiaoyan Yu, Chen Yang
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10577094/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-grained image classification is a notable challenge in the field of computer vision. The primary influencing factor is that similar images often have different labels, meaning there is high inter-class similarity and low intra-class similarity. An increasing number of fine-grained classification models utilize attention mechanisms to extract distinguishable regions to address this issue, yet they overlook other equally distinguishable but less obvious features. Moreover, these mechanisms typically enhance features in only one dimension while neglecting those in another. Additionally, there is a lack of rational use of features extracted from intermediate layers. To tackle these problems, we propose a fine-grained visual classification model based on dual attention feature supplementation. This model obtains dual-dimensional enhanced features through cross-attention in two dimensions and allows the network to explore other potential discriminative areas by suppressing the enhanced features. Furthermore, a feature pyramid approach is employed to acquire multi-scale features, and an outer product is used to explore relationships among feature components, enhancing the utilization of intermediate layer features and the learning of refined characteristics. Empirical evidence from experiments proves that our method does not require additional annotations beyond image labels and has achieved satisfactory performance on several public benchmark fine-grained datasets.
ISSN:2169-3536