Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture

As applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low ene...

Full description

Saved in:

Bibliographic Details
Main Authors:	Munhyung Lee, Taehan Lee, Junwon Yeo, Hyukjun Lee
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Deep neural networks high performance memory architecture STT-MRAM retention time
Online Access:	https://ieeexplore.ieee.org/document/10752558/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850064209659822080
author	Munhyung Lee Taehan Lee Junwon Yeo Hyukjun Lee
author_facet	Munhyung Lee Taehan Lee Junwon Yeo Hyukjun Lee
author_sort	Munhyung Lee
collection	DOAJ
description	As applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low energy consumption when used to replace DRAM in main memory. Most of the memory space used in DNN applications is occupied by weight and activation data. Typically, the contents of weight data remain unchanged during inference, whereas activation data are modified to store intermediate results across the layers of the DNN. As large DNN applications consume significant energy and require high performance, STT-MRAM is an ideal candidate to fully or partially replace DRAM in main memory. However, the long write latency of STT-MRAM compared to DRAM presents a performance bottleneck. In this work, we propose a reduced retention-time MRAM-based main memory to address this issue. We divide the MRAM into multiple partitions, with each partition implemented with different retention times, tailored for DNN applications. In this approach, the DNN weights can be assigned to the long retention-time partition, while the activation data can be allocated to the short retention-time partition, optimizing for the varying characteristics of data reuse. To achieve high performance, we propose two mapping schemes: intra-segment and inter-segment circular buffers, which dynamically map DNN activation data (i.e., virtual pages of streaming data) to physical pages in a circular fashion to exploit reuse patterns of DNN data. These circular buffers are mapped to the short retention-time MRAM partition as much as possible. The intra-circular buffer mapping scheme achieves an average improvement of 14.4% in bandwidth and 12.6% in inference latency compared to DRAM for on-device DNN applications. Furthermore, the inter-circular buffer mapping scheme offers 11.1% bandwidth and 11.2% latency improvements on average using only 16 MBytes of the short retention-time partition.
format	Article
id	doaj-art-c75fce72377a4d51a2b5597f4abae6fa
institution	DOAJ
issn	2169-3536
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c75fce72377a4d51a2b5597f4abae6fa2025-08-20T02:49:22ZengIEEEIEEE Access2169-35362024-01-011217129517130310.1109/ACCESS.2024.349690610752558Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory ArchitectureMunhyung Lee0Taehan Lee1https://orcid.org/0009-0009-3576-2770Junwon Yeo2https://orcid.org/0009-0004-7712-3102Hyukjun Lee3https://orcid.org/0000-0003-2981-0800Department of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaAs applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low energy consumption when used to replace DRAM in main memory. Most of the memory space used in DNN applications is occupied by weight and activation data. Typically, the contents of weight data remain unchanged during inference, whereas activation data are modified to store intermediate results across the layers of the DNN. As large DNN applications consume significant energy and require high performance, STT-MRAM is an ideal candidate to fully or partially replace DRAM in main memory. However, the long write latency of STT-MRAM compared to DRAM presents a performance bottleneck. In this work, we propose a reduced retention-time MRAM-based main memory to address this issue. We divide the MRAM into multiple partitions, with each partition implemented with different retention times, tailored for DNN applications. In this approach, the DNN weights can be assigned to the long retention-time partition, while the activation data can be allocated to the short retention-time partition, optimizing for the varying characteristics of data reuse. To achieve high performance, we propose two mapping schemes: intra-segment and inter-segment circular buffers, which dynamically map DNN activation data (i.e., virtual pages of streaming data) to physical pages in a circular fashion to exploit reuse patterns of DNN data. These circular buffers are mapped to the short retention-time MRAM partition as much as possible. The intra-circular buffer mapping scheme achieves an average improvement of 14.4% in bandwidth and 12.6% in inference latency compared to DRAM for on-device DNN applications. Furthermore, the inter-circular buffer mapping scheme offers 11.1% bandwidth and 11.2% latency improvements on average using only 16 MBytes of the short retention-time partition.https://ieeexplore.ieee.org/document/10752558/Deep neural networkshigh performance memory architectureSTT-MRAMretention time
spellingShingle	Munhyung Lee Taehan Lee Junwon Yeo Hyukjun Lee Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture IEEE Access Deep neural networks high performance memory architecture STT-MRAM retention time
title	Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_full	Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_fullStr	Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_full_unstemmed	Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_short	Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_sort	enhancing on device dnn inference performance with a reduced retention time mram based memory architecture
topic	Deep neural networks high performance memory architecture STT-MRAM retention time
url	https://ieeexplore.ieee.org/document/10752558/
work_keys_str_mv	AT munhyunglee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture AT taehanlee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture AT junwonyeo enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture AT hyukjunlee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture

Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture

Similar Items