AIPerf: Automated Machine Learning as an AI-HPC Benchmark

The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerg...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhixiang Ren, Yongheng Liu, Tianhui Shi, Lei Xie, Yue Zhou, Jidong Zhai, Youhui Zhang, Yunquan Zhang, Wenguang Chen
Format: Article
Language:English
Published: Tsinghua University Press 2021-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2021.9020004
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568911643541504
author Zhixiang Ren
Yongheng Liu
Tianhui Shi
Lei Xie
Yue Zhou
Jidong Zhai
Youhui Zhang
Yunquan Zhang
Wenguang Chen
author_facet Zhixiang Ren
Yongheng Liu
Tianhui Shi
Lei Xie
Yue Zhou
Jidong Zhai
Youhui Zhang
Yunquan Zhang
Wenguang Chen
author_sort Zhixiang Ren
collection DOAJ
description The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.
format Article
id doaj-art-62df745ba665431ca9caa42ffd7d5f21
institution Kabale University
issn 2096-0654
language English
publishDate 2021-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-62df745ba665431ca9caa42ffd7d5f212025-02-02T23:47:26ZengTsinghua University PressBig Data Mining and Analytics2096-06542021-09-014320822010.26599/BDMA.2021.9020004AIPerf: Automated Machine Learning as an AI-HPC BenchmarkZhixiang Ren0Yongheng Liu1Tianhui Shi2Lei Xie3Yue Zhou4Jidong Zhai5Youhui Zhang6Yunquan Zhang7Wenguang Chen8<institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution>Institute of Computing Technology, Chinese Academy of Sciences</institution>, <city>Beijing</city> <postal-code>100086</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.https://www.sciopen.com/article/10.26599/BDMA.2021.9020004high-performance computing (hpc)artificial intelligence (ai)automated machine learning
spellingShingle Zhixiang Ren
Yongheng Liu
Tianhui Shi
Lei Xie
Yue Zhou
Jidong Zhai
Youhui Zhang
Yunquan Zhang
Wenguang Chen
AIPerf: Automated Machine Learning as an AI-HPC Benchmark
Big Data Mining and Analytics
high-performance computing (hpc)
artificial intelligence (ai)
automated machine learning
title AIPerf: Automated Machine Learning as an AI-HPC Benchmark
title_full AIPerf: Automated Machine Learning as an AI-HPC Benchmark
title_fullStr AIPerf: Automated Machine Learning as an AI-HPC Benchmark
title_full_unstemmed AIPerf: Automated Machine Learning as an AI-HPC Benchmark
title_short AIPerf: Automated Machine Learning as an AI-HPC Benchmark
title_sort aiperf automated machine learning as an ai hpc benchmark
topic high-performance computing (hpc)
artificial intelligence (ai)
automated machine learning
url https://www.sciopen.com/article/10.26599/BDMA.2021.9020004
work_keys_str_mv AT zhixiangren aiperfautomatedmachinelearningasanaihpcbenchmark
AT yonghengliu aiperfautomatedmachinelearningasanaihpcbenchmark
AT tianhuishi aiperfautomatedmachinelearningasanaihpcbenchmark
AT leixie aiperfautomatedmachinelearningasanaihpcbenchmark
AT yuezhou aiperfautomatedmachinelearningasanaihpcbenchmark
AT jidongzhai aiperfautomatedmachinelearningasanaihpcbenchmark
AT youhuizhang aiperfautomatedmachinelearningasanaihpcbenchmark
AT yunquanzhang aiperfautomatedmachinelearningasanaihpcbenchmark
AT wenguangchen aiperfautomatedmachinelearningasanaihpcbenchmark