A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators
The rapid development of deep neural networks (DNNs), such as convolutional neural networks and transformer-based large language models, has significantly advanced AI applications. However, these advances have introduced substantial computational and data demands, presenting challenges for the devel...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Micromachines |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2072-666X/16/3/336 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850091472986046464 |
|---|---|
| author | Hao Sun Junzhong Shen Changwu Zhang Hengzhu Liu |
| author_facet | Hao Sun Junzhong Shen Changwu Zhang Hengzhu Liu |
| author_sort | Hao Sun |
| collection | DOAJ |
| description | The rapid development of deep neural networks (DNNs), such as convolutional neural networks and transformer-based large language models, has significantly advanced AI applications. However, these advances have introduced substantial computational and data demands, presenting challenges for the development of systolic array accelerators, which excel in tensor operations. Systolic array accelerators are typically developed using two approaches: scale-up, which increases the size of a single array, and scale-out, which involves multiple parallel arrays of fixed size. Scale-up achieves high performance in large-scale matrix multiplications, while scale-out offers better energy efficiency for lower-dimensional matrix multiplications. However, neither approach can simultaneously maintain both high performance and high energy efficiency across the full spectrum of DNN tasks. In this work, we propose a hybrid approach that integrates scale-up and scale-out techniques. We use mapping space exploration in a multi-tenant application environment to assign DNN operations to specific systolic array modules, thereby optimizing performance and energy efficiency. Experiments show that our proposed hybrid systolic array accelerator reduces energy consumption by up to 8% on average and improves throughput by up to 57% on average, compared to TPUv3 across various DNN models. |
| format | Article |
| id | doaj-art-a3faa470041c4c20a2711eabb4d44ee5 |
| institution | DOAJ |
| issn | 2072-666X |
| language | English |
| publishDate | 2025-03-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Micromachines |
| spelling | doaj-art-a3faa470041c4c20a2711eabb4d44ee52025-08-20T02:42:22ZengMDPI AGMicromachines2072-666X2025-03-0116333610.3390/mi16030336A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array AcceleratorsHao Sun0Junzhong Shen1Changwu Zhang2Hengzhu Liu3College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaAcademy of Military Science, Beijing 100091, ChinaCollege of Computer Science and Technology, National University of Defense Technology, Changsha 410073, ChinaThe rapid development of deep neural networks (DNNs), such as convolutional neural networks and transformer-based large language models, has significantly advanced AI applications. However, these advances have introduced substantial computational and data demands, presenting challenges for the development of systolic array accelerators, which excel in tensor operations. Systolic array accelerators are typically developed using two approaches: scale-up, which increases the size of a single array, and scale-out, which involves multiple parallel arrays of fixed size. Scale-up achieves high performance in large-scale matrix multiplications, while scale-out offers better energy efficiency for lower-dimensional matrix multiplications. However, neither approach can simultaneously maintain both high performance and high energy efficiency across the full spectrum of DNN tasks. In this work, we propose a hybrid approach that integrates scale-up and scale-out techniques. We use mapping space exploration in a multi-tenant application environment to assign DNN operations to specific systolic array modules, thereby optimizing performance and energy efficiency. Experiments show that our proposed hybrid systolic array accelerator reduces energy consumption by up to 8% on average and improves throughput by up to 57% on average, compared to TPUv3 across various DNN models.https://www.mdpi.com/2072-666X/16/3/336systolic arraydeep neural networkperformance optimizationenergy efficiencyaccelerators |
| spellingShingle | Hao Sun Junzhong Shen Changwu Zhang Hengzhu Liu A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators Micromachines systolic array deep neural network performance optimization energy efficiency accelerators |
| title | A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators |
| title_full | A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators |
| title_fullStr | A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators |
| title_full_unstemmed | A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators |
| title_short | A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators |
| title_sort | hybrid scale up and scale out approach for performance and energy efficiency optimization in systolic array accelerators |
| topic | systolic array deep neural network performance optimization energy efficiency accelerators |
| url | https://www.mdpi.com/2072-666X/16/3/336 |
| work_keys_str_mv | AT haosun ahybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT junzhongshen ahybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT changwuzhang ahybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT hengzhuliu ahybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT haosun hybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT junzhongshen hybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT changwuzhang hybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators AT hengzhuliu hybridscaleupandscaleoutapproachforperformanceandenergyefficiencyoptimizationinsystolicarrayaccelerators |