SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks

Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs)...

Full description

Saved in:
Bibliographic Details
Main Authors: Ismael Elsharkawi, Hossam Sharara, Ahmed Rafea
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10845790/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575615599902720
author Ismael Elsharkawi
Hossam Sharara
Ahmed Rafea
author_facet Ismael Elsharkawi
Hossam Sharara
Ahmed Rafea
author_sort Ismael Elsharkawi
collection DOAJ
description Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs) to Vision Transformers and MLP-Mixers that were more recently introduced to represent images as sequences. Most recently, Vision Graph Neural Networks (ViG) have shown very promising performance through representing images as graphs. The performance of ViG models heavily depends on how the graph is constructed. The ViG model relies on k-nearest neighbors (k-nn) for graph construction, which while achieving very good performance on classical computer vision tasks, imposes a number of challenges, such as determining the optimal value for k, as well as using the same chosen value for all nodes in a graph, which in turns reduces the graph expressiveness and limits the power of the model. In this paper, we propose a new approach that relies on similarity score thresholding to create the graph edges and, subsequently, pick the neighboring nodes. Rather than the number of neighbors, we allow for the specification of the normalized similarity threshold as an input parameter for each layer, which is more intuitive. We also propose a decreasing threshold framework to select the input threshold for all layers. We show that our proposed method can achieve higher performance than the ViG model for image classification on the benchmark ImageNet-1K dataset, without increasing the complexity of the model. PyTorch code and checkpoints are available at <uri>https://github.com/IsmaelElsharkawi/SViG</uri>.
format Article
id doaj-art-c87a3f271b164d04835b6bed3110719d
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c87a3f271b164d04835b6bed3110719d2025-01-31T23:04:41ZengIEEEIEEE Access2169-35362025-01-0113193791938710.1109/ACCESS.2025.353169110845790SViG: A Similarity-Thresholded Approach for Vision Graph Neural NetworksIsmael Elsharkawi0https://orcid.org/0009-0002-8510-2884Hossam Sharara1https://orcid.org/0000-0003-0042-9790Ahmed Rafea2https://orcid.org/0000-0001-8109-1845Department of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptDepartment of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptDepartment of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptImage representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs) to Vision Transformers and MLP-Mixers that were more recently introduced to represent images as sequences. Most recently, Vision Graph Neural Networks (ViG) have shown very promising performance through representing images as graphs. The performance of ViG models heavily depends on how the graph is constructed. The ViG model relies on k-nearest neighbors (k-nn) for graph construction, which while achieving very good performance on classical computer vision tasks, imposes a number of challenges, such as determining the optimal value for k, as well as using the same chosen value for all nodes in a graph, which in turns reduces the graph expressiveness and limits the power of the model. In this paper, we propose a new approach that relies on similarity score thresholding to create the graph edges and, subsequently, pick the neighboring nodes. Rather than the number of neighbors, we allow for the specification of the normalized similarity threshold as an input parameter for each layer, which is more intuitive. We also propose a decreasing threshold framework to select the input threshold for all layers. We show that our proposed method can achieve higher performance than the ViG model for image classification on the benchmark ImageNet-1K dataset, without increasing the complexity of the model. PyTorch code and checkpoints are available at <uri>https://github.com/IsmaelElsharkawi/SViG</uri>.https://ieeexplore.ieee.org/document/10845790/Graph Neural NetworksVision Graph Neural NetworksImage Classification
spellingShingle Ismael Elsharkawi
Hossam Sharara
Ahmed Rafea
SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
IEEE Access
Graph Neural Networks
Vision Graph Neural Networks
Image Classification
title SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_full SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_fullStr SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_full_unstemmed SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_short SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_sort svig a similarity thresholded approach for vision graph neural networks
topic Graph Neural Networks
Vision Graph Neural Networks
Image Classification
url https://ieeexplore.ieee.org/document/10845790/
work_keys_str_mv AT ismaelelsharkawi svigasimilaritythresholdedapproachforvisiongraphneuralnetworks
AT hossamsharara svigasimilaritythresholdedapproachforvisiongraphneuralnetworks
AT ahmedrafea svigasimilaritythresholdedapproachforvisiongraphneuralnetworks