Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches...

Full description

Saved in:
Bibliographic Details
Main Authors: Lifu Zhang, Tarek S. Abdelrahman
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/2021/3839543
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832548858534559744
author Lifu Zhang
Tarek S. Abdelrahman
author_facet Lifu Zhang
Tarek S. Abdelrahman
author_sort Lifu Zhang
collection DOAJ
description The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.
format Article
id doaj-art-f61241a985b4482793a25f2f26140b1a
institution Kabale University
issn 1687-9724
1687-9732
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Applied Computational Intelligence and Soft Computing
spelling doaj-art-f61241a985b4482793a25f2f26140b1a2025-02-03T06:12:51ZengWileyApplied Computational Intelligence and Soft Computing1687-97241687-97322021-01-01202110.1155/2021/38395433839543Pipelined Training with Stale Weights in Deep Convolutional Neural NetworksLifu Zhang0Tarek S. Abdelrahman1Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, CanadaEdward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, CanadaThe growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.http://dx.doi.org/10.1155/2021/3839543
spellingShingle Lifu Zhang
Tarek S. Abdelrahman
Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
Applied Computational Intelligence and Soft Computing
title Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_fullStr Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full_unstemmed Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_short Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_sort pipelined training with stale weights in deep convolutional neural networks
url http://dx.doi.org/10.1155/2021/3839543
work_keys_str_mv AT lifuzhang pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks
AT tareksabdelrahman pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks