A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices

Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power app...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ci-Hao Wu, Tian-Sheuan Chang
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Open Journal of Circuits and Systems
Subjects:	Speech enhancement transformer low power hardware implementation
Online Access:	https://ieeexplore.ieee.org/document/10496994/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592898642673664
author	Ci-Hao Wu Tian-Sheuan Chang
author_facet	Ci-Hao Wu Tian-Sheuan Chang
author_sort	Ci-Hao Wu
collection	DOAJ
description	Transformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.
format	Article
id	doaj-art-f09a3025251843f3b0f67134760946d9
institution	Kabale University
issn	2644-1225
language	English
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Open Journal of Circuits and Systems
spelling	doaj-art-f09a3025251843f3b0f67134760946d92025-01-21T00:02:46ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252024-01-01512814010.1109/OJCAS.2024.338784910496994A Low-Power Streaming Speech Enhancement Accelerator for Edge DevicesCi-Hao Wu0https://orcid.org/0009-0007-7420-7150Tian-Sheuan Chang1https://orcid.org/0000-0002-0561-8745Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, TaiwanInstitute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, TaiwanTransformer-based speech enhancement models yield impressive results. However, their heterogeneous and complex structure restricts model compression potential, resulting in greater complexity and reduced hardware efficiency. Additionally, these models are not tailored for streaming and low-power applications. Addressing these challenges, this paper proposes a low-power streaming speech enhancement accelerator through model and hardware optimization. The proposed high performance model is optimized for hardware execution with the co-design of model compression and target application, which reduces 93.9% of model size by the proposed domain-aware and streaming-aware pruning techniques. The required latency is further reduced with batch normalization-based transformers. Additionally, we employed softmax-free attention, complemented by an extra batch normalization, facilitating simpler hardware design. The tailored hardware accommodates these diverse computing patterns by breaking them down into element-wise multiplication and accumulation (MAC). This is achieved through a 1-D processing array, utilizing configurable SRAM addressing, thereby minimizing hardware complexities and simplifying zero skipping. Using the TSMC 40nm CMOS process, the final implementation requires merely 207.8K gates and 53.75KB SRAM. It consumes only 8.08 mW for real-time inference at a 62.5MHz frequency.https://ieeexplore.ieee.org/document/10496994/Speech enhancementtransformerlow powerhardware implementation
spellingShingle	Ci-Hao Wu Tian-Sheuan Chang A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices IEEE Open Journal of Circuits and Systems Speech enhancement transformer low power hardware implementation
title	A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices
title_full	A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices
title_fullStr	A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices
title_full_unstemmed	A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices
title_short	A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices
title_sort	low power streaming speech enhancement accelerator for edge devices
topic	Speech enhancement transformer low power hardware implementation
url	https://ieeexplore.ieee.org/document/10496994/
work_keys_str_mv	AT cihaowu alowpowerstreamingspeechenhancementacceleratorforedgedevices AT tiansheuanchang alowpowerstreamingspeechenhancementacceleratorforedgedevices AT cihaowu lowpowerstreamingspeechenhancementacceleratorforedgedevices AT tiansheuanchang lowpowerstreamingspeechenhancementacceleratorforedgedevices

A Low-Power Streaming Speech Enhancement Accelerator for Edge Devices

Similar Items