Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complemen...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Open Journal of Circuits and Systems |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10500889/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592862573756416 |
---|---|
author | Salinna Abdullah Majid Zamani Andreas Demosthenous |
author_facet | Salinna Abdullah Majid Zamani Andreas Demosthenous |
author_sort | Salinna Abdullah |
collection | DOAJ |
description | This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of <inline-formula> <tex-math notation="LaTeX">$3.88~mm^{2}$ </tex-math></inline-formula> and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency. |
format | Article |
id | doaj-art-629751578c634b40b182822b3723c2a9 |
institution | Kabale University |
issn | 2644-1225 |
language | English |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Open Journal of Circuits and Systems |
spelling | doaj-art-629751578c634b40b182822b3723c2a92025-01-21T00:02:52ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252024-01-01514115210.1109/OJCAS.2024.338910010500889Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep LearningSalinna Abdullah0https://orcid.org/0000-0003-0092-3190Majid Zamani1Andreas Demosthenous2https://orcid.org/0000-0003-0623-963XDepartment of Electronic and Electrical Engineering, University College London, London, U.K.Department of Electronic and Electrical Engineering, University College London, London, U.K.Department of Electronic and Electrical Engineering, University College London, London, U.K.This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of <inline-formula> <tex-math notation="LaTeX">$3.88~mm^{2}$ </tex-math></inline-formula> and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.https://ieeexplore.ieee.org/document/10500889/Deep neural networkdigital circuitsfield programmable gate array (FPGA)mappingmaskingmulti-target learning |
spellingShingle | Salinna Abdullah Majid Zamani Andreas Demosthenous Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning IEEE Open Journal of Circuits and Systems Deep neural network digital circuits field programmable gate array (FPGA) mapping masking multi-target learning |
title | Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning |
title_full | Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning |
title_fullStr | Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning |
title_full_unstemmed | Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning |
title_short | Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning |
title_sort | hardware efficient speech enhancement with noise aware multi target deep learning |
topic | Deep neural network digital circuits field programmable gate array (FPGA) mapping masking multi-target learning |
url | https://ieeexplore.ieee.org/document/10500889/ |
work_keys_str_mv | AT salinnaabdullah hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning AT majidzamani hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning AT andreasdemosthenous hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning |