SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting

Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation for faster proce...

Full description

Saved in:
Bibliographic Details
Main Authors: Ahmed Zgaren, Wassim Bouachir, Nizar Bouguila
Format: Article
Language:English
Published: MDPI AG 2025-02-01
Series:Journal of Imaging
Subjects:
Online Access:https://www.mdpi.com/2313-433X/11/2/52
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Zero-shot counting is a subcategory of Generic Visual Object Counting, which aims to count objects from an arbitrary class in a given image. While few-shot counting relies on delivering exemplars to the model to count similar class objects, zero-shot counting automates the operation for faster processing. This paper proposes a fully automated zero-shot method outperforming both zero-shot and few-shot methods. By exploiting feature maps from a pre-trained detection-based backbone, we introduce a new Visual Embedding Module designed to generate semantic embeddings within object contextual information. These embeddings are then fed to a Self-Attention Matching Module to generate an encoded representation for the head counter. Our proposed method has outperformed recent zero-shot approaches, achieving the best Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) results of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>8.89</mn></mrow></semantics></math></inline-formula> and <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>35.83</mn></mrow></semantics></math></inline-formula>, respectively, on the FSC147 dataset. Additionally, our method demonstrates competitive performance compared to few-shot methods, advancing the capabilities of visual object counting in various industrial applications such as tree counting, wildlife animal counting, and medical applications like blood cell counting.
ISSN:2313-433X