Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complemen...

Full description

Saved in:
Bibliographic Details
Main Authors: Salinna Abdullah, Majid Zamani, Andreas Demosthenous
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Open Journal of Circuits and Systems
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10500889/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592862573756416
author Salinna Abdullah
Majid Zamani
Andreas Demosthenous
author_facet Salinna Abdullah
Majid Zamani
Andreas Demosthenous
author_sort Salinna Abdullah
collection DOAJ
description This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of <inline-formula> <tex-math notation="LaTeX">$3.88~mm^{2}$ </tex-math></inline-formula> and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
format Article
id doaj-art-629751578c634b40b182822b3723c2a9
institution Kabale University
issn 2644-1225
language English
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of Circuits and Systems
spelling doaj-art-629751578c634b40b182822b3723c2a92025-01-21T00:02:52ZengIEEEIEEE Open Journal of Circuits and Systems2644-12252024-01-01514115210.1109/OJCAS.2024.338910010500889Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep LearningSalinna Abdullah0https://orcid.org/0000-0003-0092-3190Majid Zamani1Andreas Demosthenous2https://orcid.org/0000-0003-0623-963XDepartment of Electronic and Electrical Engineering, University College London, London, U.K.Department of Electronic and Electrical Engineering, University College London, London, U.K.Department of Electronic and Electrical Engineering, University College London, London, U.K.This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of <inline-formula> <tex-math notation="LaTeX">$3.88~mm^{2}$ </tex-math></inline-formula> and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.https://ieeexplore.ieee.org/document/10500889/Deep neural networkdigital circuitsfield programmable gate array (FPGA)mappingmaskingmulti-target learning
spellingShingle Salinna Abdullah
Majid Zamani
Andreas Demosthenous
Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
IEEE Open Journal of Circuits and Systems
Deep neural network
digital circuits
field programmable gate array (FPGA)
mapping
masking
multi-target learning
title Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
title_full Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
title_fullStr Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
title_full_unstemmed Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
title_short Hardware Efficient Speech Enhancement With Noise Aware Multi-Target Deep Learning
title_sort hardware efficient speech enhancement with noise aware multi target deep learning
topic Deep neural network
digital circuits
field programmable gate array (FPGA)
mapping
masking
multi-target learning
url https://ieeexplore.ieee.org/document/10500889/
work_keys_str_mv AT salinnaabdullah hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning
AT majidzamani hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning
AT andreasdemosthenous hardwareefficientspeechenhancementwithnoiseawaremultitargetdeeplearning