SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs)...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10845790/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs) to Vision Transformers and MLP-Mixers that were more recently introduced to represent images as sequences. Most recently, Vision Graph Neural Networks (ViG) have shown very promising performance through representing images as graphs. The performance of ViG models heavily depends on how the graph is constructed. The ViG model relies on k-nearest neighbors (k-nn) for graph construction, which while achieving very good performance on classical computer vision tasks, imposes a number of challenges, such as determining the optimal value for k, as well as using the same chosen value for all nodes in a graph, which in turns reduces the graph expressiveness and limits the power of the model. In this paper, we propose a new approach that relies on similarity score thresholding to create the graph edges and, subsequently, pick the neighboring nodes. Rather than the number of neighbors, we allow for the specification of the normalized similarity threshold as an input parameter for each layer, which is more intuitive. We also propose a decreasing threshold framework to select the input threshold for all layers. We show that our proposed method can achieve higher performance than the ViG model for image classification on the benchmark ImageNet-1K dataset, without increasing the complexity of the model. PyTorch code and checkpoints are available at <uri>https://github.com/IsmaelElsharkawi/SViG</uri>. |
---|---|
ISSN: | 2169-3536 |