AIPerf: Automated Machine Learning as an AI-HPC Benchmark
The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerg...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2021-09-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2021.9020004 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832568911643541504 |
---|---|
author | Zhixiang Ren Yongheng Liu Tianhui Shi Lei Xie Yue Zhou Jidong Zhai Youhui Zhang Yunquan Zhang Wenguang Chen |
author_facet | Zhixiang Ren Yongheng Liu Tianhui Shi Lei Xie Yue Zhou Jidong Zhai Youhui Zhang Yunquan Zhang Wenguang Chen |
author_sort | Zhixiang Ren |
collection | DOAJ |
description | The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era. |
format | Article |
id | doaj-art-62df745ba665431ca9caa42ffd7d5f21 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2021-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-62df745ba665431ca9caa42ffd7d5f212025-02-02T23:47:26ZengTsinghua University PressBig Data Mining and Analytics2096-06542021-09-014320822010.26599/BDMA.2021.9020004AIPerf: Automated Machine Learning as an AI-HPC BenchmarkZhixiang Ren0Yongheng Liu1Tianhui Shi2Lei Xie3Yue Zhou4Jidong Zhai5Youhui Zhang6Yunquan Zhang7Wenguang Chen8<institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution>Peng Cheng National Laboratory</institution>, <city>Shenzhen</city> <postal-code>518000</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country><institution>Institute of Computing Technology, Chinese Academy of Sciences</institution>, <city>Beijing</city> <postal-code>100086</postal-code>, <country>China</country><institution content-type="dept">Department of Computer Science and Technology</institution>, <institution>Tsinghua University</institution>, <city>Beijing</city> <postal-code>100084</postal-code>, <country>China</country>The plethora of complex Artificial Intelligence (AI) algorithms and available High-Performance Computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack performance benchmarking of AI-HPC systems has rapidly emerged. In particular, the de facto HPC benchmark, LINPACK, cannot reflect the AI computing power and input/output performance without a representative workload. Current popular AI benchmarks, such as MLPerf, have a fixed problem size and therefore limited scalability. To address these issues, we propose an end-to-end benchmark suite utilizing automated machine learning, which not only represents real AI scenarios, but also is auto-adaptively scalable to various scales of machines. We implement the algorithms in a highly parallel and flexible way to ensure the efficiency and optimization potential on diverse systems with customizable configurations. We utilize Operations Per Second (OPS), which is measured in an analytical and systematic approach, as a major metric to quantify the AI performance. We perform evaluations on various systems to ensure the benchmark’s stability and scalability, from 4 nodes with 32 NVIDIA Tesla T4 (56.1 Tera-OPS measured) up to 512 nodes with 4096 Huawei Ascend 910 (194.53 Peta-OPS measured), and the results show near-linear weak scalability. With a flexible workload and single metric, AIPerf can easily scale on and rank AI-HPC, providing a powerful benchmark suite for the coming supercomputing era.https://www.sciopen.com/article/10.26599/BDMA.2021.9020004high-performance computing (hpc)artificial intelligence (ai)automated machine learning |
spellingShingle | Zhixiang Ren Yongheng Liu Tianhui Shi Lei Xie Yue Zhou Jidong Zhai Youhui Zhang Yunquan Zhang Wenguang Chen AIPerf: Automated Machine Learning as an AI-HPC Benchmark Big Data Mining and Analytics high-performance computing (hpc) artificial intelligence (ai) automated machine learning |
title | AIPerf: Automated Machine Learning as an AI-HPC Benchmark |
title_full | AIPerf: Automated Machine Learning as an AI-HPC Benchmark |
title_fullStr | AIPerf: Automated Machine Learning as an AI-HPC Benchmark |
title_full_unstemmed | AIPerf: Automated Machine Learning as an AI-HPC Benchmark |
title_short | AIPerf: Automated Machine Learning as an AI-HPC Benchmark |
title_sort | aiperf automated machine learning as an ai hpc benchmark |
topic | high-performance computing (hpc) artificial intelligence (ai) automated machine learning |
url | https://www.sciopen.com/article/10.26599/BDMA.2021.9020004 |
work_keys_str_mv | AT zhixiangren aiperfautomatedmachinelearningasanaihpcbenchmark AT yonghengliu aiperfautomatedmachinelearningasanaihpcbenchmark AT tianhuishi aiperfautomatedmachinelearningasanaihpcbenchmark AT leixie aiperfautomatedmachinelearningasanaihpcbenchmark AT yuezhou aiperfautomatedmachinelearningasanaihpcbenchmark AT jidongzhai aiperfautomatedmachinelearningasanaihpcbenchmark AT youhuizhang aiperfautomatedmachinelearningasanaihpcbenchmark AT yunquanzhang aiperfautomatedmachinelearningasanaihpcbenchmark AT wenguangchen aiperfautomatedmachinelearningasanaihpcbenchmark |