Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lifu Zhang, Tarek S. Abdelrahman
Format:	Article
Language:	English
Published:	Wiley 2021-01-01
Series:	Applied Computational Intelligence and Soft Computing
Online Access:	http://dx.doi.org/10.1155/2021/3839543
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832548858534559744
author	Lifu Zhang Tarek S. Abdelrahman
author_facet	Lifu Zhang Tarek S. Abdelrahman
author_sort	Lifu Zhang
collection	DOAJ
description	The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.
format	Article
id	doaj-art-f61241a985b4482793a25f2f26140b1a
institution	Kabale University
issn	1687-9724 1687-9732
language	English
publishDate	2021-01-01
publisher	Wiley
record_format	Article
series	Applied Computational Intelligence and Soft Computing
spelling	doaj-art-f61241a985b4482793a25f2f26140b1a2025-02-03T06:12:51ZengWileyApplied Computational Intelligence and Soft Computing1687-97241687-97322021-01-01202110.1155/2021/38395433839543Pipelined Training with Stale Weights in Deep Convolutional Neural NetworksLifu Zhang0Tarek S. Abdelrahman1Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, CanadaEdward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, CanadaThe growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.http://dx.doi.org/10.1155/2021/3839543
spellingShingle	Lifu Zhang Tarek S. Abdelrahman Pipelined Training with Stale Weights in Deep Convolutional Neural Networks Applied Computational Intelligence and Soft Computing
title	Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full	Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_fullStr	Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_full_unstemmed	Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_short	Pipelined Training with Stale Weights in Deep Convolutional Neural Networks
title_sort	pipelined training with stale weights in deep convolutional neural networks
url	http://dx.doi.org/10.1155/2021/3839543
work_keys_str_mv	AT lifuzhang pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks AT tareksabdelrahman pipelinedtrainingwithstaleweightsindeepconvolutionalneuralnetworks

Pipelined Training with Stale Weights in Deep Convolutional Neural Networks

Similar Items