Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture

As applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low ene...

Full description

Saved in:
Bibliographic Details
Main Authors: Munhyung Lee, Taehan Lee, Junwon Yeo, Hyukjun Lee
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10752558/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850064209659822080
author Munhyung Lee
Taehan Lee
Junwon Yeo
Hyukjun Lee
author_facet Munhyung Lee
Taehan Lee
Junwon Yeo
Hyukjun Lee
author_sort Munhyung Lee
collection DOAJ
description As applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low energy consumption when used to replace DRAM in main memory. Most of the memory space used in DNN applications is occupied by weight and activation data. Typically, the contents of weight data remain unchanged during inference, whereas activation data are modified to store intermediate results across the layers of the DNN. As large DNN applications consume significant energy and require high performance, STT-MRAM is an ideal candidate to fully or partially replace DRAM in main memory. However, the long write latency of STT-MRAM compared to DRAM presents a performance bottleneck. In this work, we propose a reduced retention-time MRAM-based main memory to address this issue. We divide the MRAM into multiple partitions, with each partition implemented with different retention times, tailored for DNN applications. In this approach, the DNN weights can be assigned to the long retention-time partition, while the activation data can be allocated to the short retention-time partition, optimizing for the varying characteristics of data reuse. To achieve high performance, we propose two mapping schemes: intra-segment and inter-segment circular buffers, which dynamically map DNN activation data (i.e., virtual pages of streaming data) to physical pages in a circular fashion to exploit reuse patterns of DNN data. These circular buffers are mapped to the short retention-time MRAM partition as much as possible. The intra-circular buffer mapping scheme achieves an average improvement of 14.4% in bandwidth and 12.6% in inference latency compared to DRAM for on-device DNN applications. Furthermore, the inter-circular buffer mapping scheme offers 11.1% bandwidth and 11.2% latency improvements on average using only 16 MBytes of the short retention-time partition.
format Article
id doaj-art-c75fce72377a4d51a2b5597f4abae6fa
institution DOAJ
issn 2169-3536
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c75fce72377a4d51a2b5597f4abae6fa2025-08-20T02:49:22ZengIEEEIEEE Access2169-35362024-01-011217129517130310.1109/ACCESS.2024.349690610752558Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory ArchitectureMunhyung Lee0Taehan Lee1https://orcid.org/0009-0009-3576-2770Junwon Yeo2https://orcid.org/0009-0004-7712-3102Hyukjun Lee3https://orcid.org/0000-0003-2981-0800Department of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaDepartment of Computer Science and Engineering, Sogang University, Seoul, South KoreaAs applications using deep neural networks (DNNs) are increasingly deployed on mobile devices, researchers are exploring various methods to achieve low energy consumption and high performance. Recently, advances in STT-MRAM have shown promise in offering non-volatility, high performance, and low energy consumption when used to replace DRAM in main memory. Most of the memory space used in DNN applications is occupied by weight and activation data. Typically, the contents of weight data remain unchanged during inference, whereas activation data are modified to store intermediate results across the layers of the DNN. As large DNN applications consume significant energy and require high performance, STT-MRAM is an ideal candidate to fully or partially replace DRAM in main memory. However, the long write latency of STT-MRAM compared to DRAM presents a performance bottleneck. In this work, we propose a reduced retention-time MRAM-based main memory to address this issue. We divide the MRAM into multiple partitions, with each partition implemented with different retention times, tailored for DNN applications. In this approach, the DNN weights can be assigned to the long retention-time partition, while the activation data can be allocated to the short retention-time partition, optimizing for the varying characteristics of data reuse. To achieve high performance, we propose two mapping schemes: intra-segment and inter-segment circular buffers, which dynamically map DNN activation data (i.e., virtual pages of streaming data) to physical pages in a circular fashion to exploit reuse patterns of DNN data. These circular buffers are mapped to the short retention-time MRAM partition as much as possible. The intra-circular buffer mapping scheme achieves an average improvement of 14.4% in bandwidth and 12.6% in inference latency compared to DRAM for on-device DNN applications. Furthermore, the inter-circular buffer mapping scheme offers 11.1% bandwidth and 11.2% latency improvements on average using only 16 MBytes of the short retention-time partition.https://ieeexplore.ieee.org/document/10752558/Deep neural networkshigh performance memory architectureSTT-MRAMretention time
spellingShingle Munhyung Lee
Taehan Lee
Junwon Yeo
Hyukjun Lee
Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
IEEE Access
Deep neural networks
high performance memory architecture
STT-MRAM
retention time
title Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_full Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_fullStr Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_full_unstemmed Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_short Enhancing On-Device DNN Inference Performance With a Reduced Retention-Time MRAM-Based Memory Architecture
title_sort enhancing on device dnn inference performance with a reduced retention time mram based memory architecture
topic Deep neural networks
high performance memory architecture
STT-MRAM
retention time
url https://ieeexplore.ieee.org/document/10752558/
work_keys_str_mv AT munhyunglee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture
AT taehanlee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture
AT junwonyeo enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture
AT hyukjunlee enhancingondevicednninferenceperformancewithareducedretentiontimemrambasedmemoryarchitecture