A Graph-Based Learning Framework for Compiler Loop Auto-Vectorization

The single instruction multiple data (SIMD) capability in modern processors is critical to improving the performance of current compute-intensive programs. Modern compilers use vectorization techniques to exploit the SIMD capability, by detecting data parallelism in scalar source code and transformi...

Full description

Saved in:
Bibliographic Details
Main Authors: Yao Xiao, Nesreen K. Ahmed, Mihai Capotă, Guixiang Ma, Theodore L. Willke, Shahin Nazarian, Paul Bogdan
Format: Article
Language:English
Published: American Association for the Advancement of Science (AAAS) 2025-01-01
Series:Intelligent Computing
Online Access:https://spj.science.org/doi/10.34133/icomputing.0113
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The single instruction multiple data (SIMD) capability in modern processors is critical to improving the performance of current compute-intensive programs. Modern compilers use vectorization techniques to exploit the SIMD capability, by detecting data parallelism in scalar source code and transforming a group of scalar instructions into vector-based instructions. In this study, we focus on one of the most common vectorization techniques, a technique called loop-based vectorization, which targets loops and optimizes their performance by grouping multiple occurrences of the same operation across loop iterations into a single SIMD instruction. We propose a data-driven graph-based learning framework for automatic vectorization, called autograph, which takes an input program, extracts the loops, and then learns a structured representation to automatically predict the correct vectorization and interleaving factors. Our proposed framework utilizes deep reinforcement learning to learn an optimal policy (observations to actions) from an intelligent agent in a SIMD environment, and automatically injects the predicted vectorization pragmas into the input program. We conducted an extensive evaluation on multiple benchmark datasets and comparisons with state-of-the-art baselines. Our results show that autograph achieves on average 2.49× performance improvement for Polybench compared to NeuroVectorizer and 3.69× compared to the baseline -O3.
ISSN:2771-5892