Synthetic graphs for link prediction benchmarking
Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IOP Publishing
2025-01-01
|
Series: | Journal of Physics: Complexity |
Subjects: | |
Online Access: | https://doi.org/10.1088/2632-072X/ada07f |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590531062923264 |
---|---|
author | Alexey Vlaskin Eduardo G Altmann |
author_facet | Alexey Vlaskin Eduardo G Altmann |
author_sort | Alexey Vlaskin |
collection | DOAJ |
description | Predicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ubiquitous structures in complex networks. A key contribution is the derivation of theoretical upper bounds for link prediction performance in our synthetic graphs, allowing us to estimate the predictability of the task and obtain an improved assessment of the performance of any method. Our results on the performance of classical methods (e.g. Stochastic Block Models, Node2Vec, GraphSage) show that the performance of all methods correlate with the theoretical predictability, that no single method is universally superior, and that each of the methods exploit different characteristics known to exist in large classes of networks. Our findings underline the need for careful consideration of graph structure when selecting a link prediction method and emphasize the value of comparing performance against synthetic benchmarks. We provide open-source code for generating these synthetic graphs, enabling further research on link prediction methods. |
format | Article |
id | doaj-art-4af360f18b3d4235a9a73cf3a69ca7e1 |
institution | Kabale University |
issn | 2632-072X |
language | English |
publishDate | 2025-01-01 |
publisher | IOP Publishing |
record_format | Article |
series | Journal of Physics: Complexity |
spelling | doaj-art-4af360f18b3d4235a9a73cf3a69ca7e12025-01-23T12:32:32ZengIOP PublishingJournal of Physics: Complexity2632-072X2025-01-016101500410.1088/2632-072X/ada07fSynthetic graphs for link prediction benchmarkingAlexey Vlaskin0https://orcid.org/0009-0005-1137-9436Eduardo G Altmann1https://orcid.org/0000-0002-1932-3710School of Mathematics and Statistics & Centre for Complex Systems, The University of Sydney , 2024 NSW, Sydney, AustraliaSchool of Mathematics and Statistics & Centre for Complex Systems, The University of Sydney , 2024 NSW, Sydney, AustraliaPredicting missing links in complex networks requires algorithms that are able to explore statistical regularities in the existing data. Here we investigate the interplay between algorithm efficiency and network structures through the introduction of suitably-designed synthetic graphs. We propose a family of random graphs that incorporates both micro-scale motifs and meso-scale communities, two ubiquitous structures in complex networks. A key contribution is the derivation of theoretical upper bounds for link prediction performance in our synthetic graphs, allowing us to estimate the predictability of the task and obtain an improved assessment of the performance of any method. Our results on the performance of classical methods (e.g. Stochastic Block Models, Node2Vec, GraphSage) show that the performance of all methods correlate with the theoretical predictability, that no single method is universally superior, and that each of the methods exploit different characteristics known to exist in large classes of networks. Our findings underline the need for careful consideration of graph structure when selecting a link prediction method and emphasize the value of comparing performance against synthetic benchmarks. We provide open-source code for generating these synthetic graphs, enabling further research on link prediction methods.https://doi.org/10.1088/2632-072X/ada07flink predictioncomplex networksstochastic block model |
spellingShingle | Alexey Vlaskin Eduardo G Altmann Synthetic graphs for link prediction benchmarking Journal of Physics: Complexity link prediction complex networks stochastic block model |
title | Synthetic graphs for link prediction benchmarking |
title_full | Synthetic graphs for link prediction benchmarking |
title_fullStr | Synthetic graphs for link prediction benchmarking |
title_full_unstemmed | Synthetic graphs for link prediction benchmarking |
title_short | Synthetic graphs for link prediction benchmarking |
title_sort | synthetic graphs for link prediction benchmarking |
topic | link prediction complex networks stochastic block model |
url | https://doi.org/10.1088/2632-072X/ada07f |
work_keys_str_mv | AT alexeyvlaskin syntheticgraphsforlinkpredictionbenchmarking AT eduardogaltmann syntheticgraphsforlinkpredictionbenchmarking |