Statistical Validity of Neural-Net Benchmarks
Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Open Journal of the Computer Society |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10816528/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832542621441982464 |
---|---|
author | Alain Hadges Srikar Bellur |
author_facet | Alain Hadges Srikar Bellur |
author_sort | Alain Hadges |
collection | DOAJ |
description | Claims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity. |
format | Article |
id | doaj-art-ea33a80f195c40cfa155b8e148d6ba9d |
institution | Kabale University |
issn | 2644-1268 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Open Journal of the Computer Society |
spelling | doaj-art-ea33a80f195c40cfa155b8e148d6ba9d2025-02-04T00:00:51ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01621122210.1109/OJCS.2024.352318310816528Statistical Validity of Neural-Net BenchmarksAlain Hadges0https://orcid.org/0009-0003-7996-6528Srikar Bellur1Harrisburg University of Science & Technology, Harrisburg, PA, USADepartment of Data Analytics, Harrisburg University of Science & Technology, Harrisburg, PA, USAClaims of better, faster or more efficient neural-net designs often hinge on low single digit percentage improvements (or less) in accuracy or speed compared to others. Current benchmark differences used for comparison have been based on a number of different metrics such as recall, the best of five-runs, the median of five runs, Top-1, Top-5, BLEU, ROC, RMS, etc. These metrics implicitly assert comparable distributions of metrics. Conspicuous by their absence are measures of statistical validity of these benchmark comparisons. This study examined neural-net benchmark metric distributions and determined there are researcher degrees of freedom that may affect comparison validity. An essay is developed and proposed for benchmarking and comparing reasonably expected neural-net performance metrics that minimizes researcher degrees of freedom. The essay includes an estimate of the effects and the interactions of hyper-parameter settings on the benchmark metrics of a neural-net as a measure of its optimization complexity.https://ieeexplore.ieee.org/document/10816528/Bayesian credible intervalbenchmark essaycomparisonfactorial experimenthyper-parametersmachine learning |
spellingShingle | Alain Hadges Srikar Bellur Statistical Validity of Neural-Net Benchmarks IEEE Open Journal of the Computer Society Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning |
title | Statistical Validity of Neural-Net Benchmarks |
title_full | Statistical Validity of Neural-Net Benchmarks |
title_fullStr | Statistical Validity of Neural-Net Benchmarks |
title_full_unstemmed | Statistical Validity of Neural-Net Benchmarks |
title_short | Statistical Validity of Neural-Net Benchmarks |
title_sort | statistical validity of neural net benchmarks |
topic | Bayesian credible interval benchmark essay comparison factorial experiment hyper-parameters machine learning |
url | https://ieeexplore.ieee.org/document/10816528/ |
work_keys_str_mv | AT alainhadges statisticalvalidityofneuralnetbenchmarks AT srikarbellur statisticalvalidityofneuralnetbenchmarks |