Statistical Validity of Neural-Net Benchmarks

Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five...

Full description

Saved in:
Bibliographic Details
Main Authors: Alain Hadges, Srikar Bellur
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10816528/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542621441982464
author Alain Hadges
Srikar Bellur
author_facet Alain Hadges
Srikar Bellur
author_sort Alain Hadges
collection DOAJ
description Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.
format Article
id doaj-art-ea33a80f195c40cfa155b8e148d6ba9d
institution Kabale University
issn 2644-1268
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj-art-ea33a80f195c40cfa155b8e148d6ba9d2025-02-04T00:00:51ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01621122210.1109/OJCS.2024.352318310816528Statistical Validity of Neural-Net BenchmarksAlain Hadges0https://orcid.org/0009-0003-7996-6528Srikar Bellur1Harrisburg University of Science & Technology, Harrisburg, PA, USADepartment of Data Analytics, Harrisburg University of Science & Technology, Harrisburg, PA, USAClaims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.https://ieeexplore.ieee.org/document/10816528/Bayesian credible intervalbenchmark essaycomparisonfactorial experimenthyper-parametersmachine learning
spellingShingle Alain Hadges
Srikar Bellur
Statistical Validity of Neural-Net Benchmarks
IEEE Open Journal of the Computer Society
Bayesian credible interval
benchmark essay
comparison
factorial experiment
hyper-parameters
machine learning
title Statistical Validity of Neural-Net Benchmarks
title_full Statistical Validity of Neural-Net Benchmarks
title_fullStr Statistical Validity of Neural-Net Benchmarks
title_full_unstemmed Statistical Validity of Neural-Net Benchmarks
title_short Statistical Validity of Neural-Net Benchmarks
title_sort statistical validity of neural net benchmarks
topic Bayesian credible interval
benchmark essay
comparison
factorial experiment
hyper-parameters
machine learning
url https://ieeexplore.ieee.org/document/10816528/
work_keys_str_mv AT alainhadges statisticalvalidityofneuralnetbenchmarks
AT srikarbellur statisticalvalidityofneuralnetbenchmarks