SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks

Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs)...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ismael Elsharkawi, Hossam Sharara, Ahmed Rafea
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Graph Neural Networks Vision Graph Neural Networks Image Classification
Online Access:	https://ieeexplore.ieee.org/document/10845790/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832575615599902720
author	Ismael Elsharkawi Hossam Sharara Ahmed Rafea
author_facet	Ismael Elsharkawi Hossam Sharara Ahmed Rafea
author_sort	Ismael Elsharkawi
collection	DOAJ
description	Image representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs) to Vision Transformers and MLP-Mixers that were more recently introduced to represent images as sequences. Most recently, Vision Graph Neural Networks (ViG) have shown very promising performance through representing images as graphs. The performance of ViG models heavily depends on how the graph is constructed. The ViG model relies on k-nearest neighbors (k-nn) for graph construction, which while achieving very good performance on classical computer vision tasks, imposes a number of challenges, such as determining the optimal value for k, as well as using the same chosen value for all nodes in a graph, which in turns reduces the graph expressiveness and limits the power of the model. In this paper, we propose a new approach that relies on similarity score thresholding to create the graph edges and, subsequently, pick the neighboring nodes. Rather than the number of neighbors, we allow for the specification of the normalized similarity threshold as an input parameter for each layer, which is more intuitive. We also propose a decreasing threshold framework to select the input threshold for all layers. We show that our proposed method can achieve higher performance than the ViG model for image classification on the benchmark ImageNet-1K dataset, without increasing the complexity of the model. PyTorch code and checkpoints are available at <uri>https://github.com/IsmaelElsharkawi/SViG</uri>.
format	Article
id	doaj-art-c87a3f271b164d04835b6bed3110719d
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c87a3f271b164d04835b6bed3110719d2025-01-31T23:04:41ZengIEEEIEEE Access2169-35362025-01-0113193791938710.1109/ACCESS.2025.353169110845790SViG: A Similarity-Thresholded Approach for Vision Graph Neural NetworksIsmael Elsharkawi0https://orcid.org/0009-0002-8510-2884Hossam Sharara1https://orcid.org/0000-0003-0042-9790Ahmed Rafea2https://orcid.org/0000-0001-8109-1845Department of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptDepartment of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptDepartment of Computer Science and Engineering, The American University in Cairo, New Cairo, EgyptImage representation in computer vision is a long-standing problem that has a significant impact on any machine learning model performance. There have been multiple attempts to tackle this problem that were introduced in the literature, starting from traditional Convolutional Neural Networks (CNNs) to Vision Transformers and MLP-Mixers that were more recently introduced to represent images as sequences. Most recently, Vision Graph Neural Networks (ViG) have shown very promising performance through representing images as graphs. The performance of ViG models heavily depends on how the graph is constructed. The ViG model relies on k-nearest neighbors (k-nn) for graph construction, which while achieving very good performance on classical computer vision tasks, imposes a number of challenges, such as determining the optimal value for k, as well as using the same chosen value for all nodes in a graph, which in turns reduces the graph expressiveness and limits the power of the model. In this paper, we propose a new approach that relies on similarity score thresholding to create the graph edges and, subsequently, pick the neighboring nodes. Rather than the number of neighbors, we allow for the specification of the normalized similarity threshold as an input parameter for each layer, which is more intuitive. We also propose a decreasing threshold framework to select the input threshold for all layers. We show that our proposed method can achieve higher performance than the ViG model for image classification on the benchmark ImageNet-1K dataset, without increasing the complexity of the model. PyTorch code and checkpoints are available at <uri>https://github.com/IsmaelElsharkawi/SViG</uri>.https://ieeexplore.ieee.org/document/10845790/Graph Neural NetworksVision Graph Neural NetworksImage Classification
spellingShingle	Ismael Elsharkawi Hossam Sharara Ahmed Rafea SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks IEEE Access Graph Neural Networks Vision Graph Neural Networks Image Classification
title	SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_full	SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_fullStr	SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_full_unstemmed	SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_short	SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks
title_sort	svig a similarity thresholded approach for vision graph neural networks
topic	Graph Neural Networks Vision Graph Neural Networks Image Classification
url	https://ieeexplore.ieee.org/document/10845790/
work_keys_str_mv	AT ismaelelsharkawi svigasimilaritythresholdedapproachforvisiongraphneuralnetworks AT hossamsharara svigasimilaritythresholdedapproachforvisiongraphneuralnetworks AT ahmedrafea svigasimilaritythresholdedapproachforvisiongraphneuralnetworks

SViG: A Similarity-Thresholded Approach for Vision Graph Neural Networks

Similar Items