Synthetic graphs for link prediction benchmarking

Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a...

Full description

Saved in:
Bibliographic Details
Main Authors: Alexey Vlaskin, Eduardo G Altmann
Format: Article
Language:English
Published: IOP Publishing 2025-01-01
Series:Journal of Physics: Complexity
Subjects:
Online Access:https://doi.org/10.1088/2632-072X/ada07f
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590531062923264
author Alexey Vlaskin
Eduardo G Altmann
author_facet Alexey Vlaskin
Eduardo G Altmann
author_sort Alexey Vlaskin
collection DOAJ
description Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ubiquitous structures in complex networks. A key contribution is the derivation of theoretical upper bounds for link prediction performance in our synthetic graphs, allowing us to estimate the predictability of the task and obtain an improved assessment of the performance of any method. Our results on the performance of classical methods (e.g. Stochastic Block Models, Node2Vec, GraphSage) show that the performance of all methods correlate with the theoretical predictability, that no single method is universally superior, and that each of the methods exploit different characteristics known to exist in large classes of networks. Our findings underline the need for careful consideration of graph structure when selecting a link prediction method and emphasize the value of comparing performance against synthetic benchmarks. We provide open-source code for generating these synthetic graphs, enabling further research on link prediction methods.
format Article
id doaj-art-4af360f18b3d4235a9a73cf3a69ca7e1
institution Kabale University
issn 2632-072X
language English
publishDate 2025-01-01
publisher IOP Publishing
record_format Article
series Journal of Physics: Complexity
spelling doaj-art-4af360f18b3d4235a9a73cf3a69ca7e12025-01-23T12:32:32ZengIOP PublishingJournal of Physics: Complexity2632-072X2025-01-016101500410.1088/2632-072X/ada07fSynthetic graphs for link prediction benchmarkingAlexey Vlaskin0https://orcid.org/0009-0005-1137-9436Eduardo G Altmann1https://orcid.org/0000-0002-1932-3710School of Mathematics and Statistics & Centre for Complex Systems, The University of Sydney , 2024 NSW, Sydney, AustraliaSchool of Mathematics and Statistics & Centre for Complex Systems, The University of Sydney , 2024 NSW, Sydney, AustraliaPredicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ubiquitous structures in complex networks. A key contribution is the derivation of theoretical upper bounds for link prediction performance in our synthetic graphs, allowing us to estimate the predictability of the task and obtain an improved assessment of the performance of any method. Our results on the performance of classical methods (e.g. Stochastic Block Models, Node2Vec, GraphSage) show that the performance of all methods correlate with the theoretical predictability, that no single method is universally superior, and that each of the methods exploit different characteristics known to exist in large classes of networks. Our findings underline the need for careful consideration of graph structure when selecting a link prediction method and emphasize the value of comparing performance against synthetic benchmarks. We provide open-source code for generating these synthetic graphs, enabling further research on link prediction methods.https://doi.org/10.1088/2632-072X/ada07flink predictioncomplex networksstochastic block model
spellingShingle Alexey Vlaskin
Eduardo G Altmann
Synthetic graphs for link prediction benchmarking
Journal of Physics: Complexity
link prediction
complex networks
stochastic block model
title Synthetic graphs for link prediction benchmarking
title_full Synthetic graphs for link prediction benchmarking
title_fullStr Synthetic graphs for link prediction benchmarking
title_full_unstemmed Synthetic graphs for link prediction benchmarking
title_short Synthetic graphs for link prediction benchmarking
title_sort synthetic graphs for link prediction benchmarking
topic link prediction
complex networks
stochastic block model
url https://doi.org/10.1088/2632-072X/ada07f
work_keys_str_mv AT alexeyvlaskin syntheticgraphsforlinkpredictionbenchmarking
AT eduardogaltmann syntheticgraphsforlinkpredictionbenchmarking