Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement

Abstract In recent studies of speech enhancement, a deep‐learning model is trained to predict clean speech spectra from the known noisy spectra of speech. Rather than using the traditional discrete Fourier transform (DFT), this paper considers other well‐known transforms to generate the speech spect...

Full description

Saved in:
Bibliographic Details
Main Authors: Wissam A. Jassim, Naomi Harte
Format: Article
Language:English
Published: Wiley 2022-06-01
Series:IET Signal Processing
Subjects:
Online Access:https://doi.org/10.1049/sil2.12109
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850174858118299648
author Wissam A. Jassim
Naomi Harte
author_facet Wissam A. Jassim
Naomi Harte
author_sort Wissam A. Jassim
collection DOAJ
description Abstract In recent studies of speech enhancement, a deep‐learning model is trained to predict clean speech spectra from the known noisy spectra of speech. Rather than using the traditional discrete Fourier transform (DFT), this paper considers other well‐known transforms to generate the speech spectra for deep‐learning‐based speech enhancement. In addition to the DFT, seven different transforms were tested: discrete Cosine transform, discrete Sine transform, discrete Haar transform, discrete Hadamard transform, discrete Tchebichef transform, discrete Krawtchouk transform, and discrete Tchebichef‐Krawtchouk transform. Two deep‐learning architectures were tested: convolutional neural networks (CNN) and fully connected neural networks. Experiments were performed for the NOIZEUS database, and various speech quality and intelligibility measures were adopted for performance evaluation. The quality and intelligibility scores of the enhanced speech demonstrate that discrete Sine transformation is better suited for the front‐end processing with a CNN as it outperformed the DFT in this kind of application. The achieved results demonstrate that combining two or more existing transforms could improve the performance in specific conditions. The tested models suggest that we should not assume that the DFT is optimal in front‐end processing with deep neural networks (DNNs). On this basis, other discrete transformations should be taken into account when designing robust DNN‐based speech processing applications.
format Article
id doaj-art-c94de3e971fc48e2b07a00c915feb592
institution OA Journals
issn 1751-9675
1751-9683
language English
publishDate 2022-06-01
publisher Wiley
record_format Article
series IET Signal Processing
spelling doaj-art-c94de3e971fc48e2b07a00c915feb5922025-08-20T02:19:34ZengWileyIET Signal Processing1751-96751751-96832022-06-0116443844810.1049/sil2.12109Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancementWissam A. Jassim0Naomi Harte1Sigmedia ADAPT Centre School of Engineering Trinity College Dublin Dublin IrelandSigmedia ADAPT Centre School of Engineering Trinity College Dublin Dublin IrelandAbstract In recent studies of speech enhancement, a deep‐learning model is trained to predict clean speech spectra from the known noisy spectra of speech. Rather than using the traditional discrete Fourier transform (DFT), this paper considers other well‐known transforms to generate the speech spectra for deep‐learning‐based speech enhancement. In addition to the DFT, seven different transforms were tested: discrete Cosine transform, discrete Sine transform, discrete Haar transform, discrete Hadamard transform, discrete Tchebichef transform, discrete Krawtchouk transform, and discrete Tchebichef‐Krawtchouk transform. Two deep‐learning architectures were tested: convolutional neural networks (CNN) and fully connected neural networks. Experiments were performed for the NOIZEUS database, and various speech quality and intelligibility measures were adopted for performance evaluation. The quality and intelligibility scores of the enhanced speech demonstrate that discrete Sine transformation is better suited for the front‐end processing with a CNN as it outperformed the DFT in this kind of application. The achieved results demonstrate that combining two or more existing transforms could improve the performance in specific conditions. The tested models suggest that we should not assume that the DFT is optimal in front‐end processing with deep neural networks (DNNs). On this basis, other discrete transformations should be taken into account when designing robust DNN‐based speech processing applications.https://doi.org/10.1049/sil2.12109discrete transformsfeedforward neural netsspeech enhancement
spellingShingle Wissam A. Jassim
Naomi Harte
Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
IET Signal Processing
discrete transforms
feedforward neural nets
speech enhancement
title Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
title_full Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
title_fullStr Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
title_full_unstemmed Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
title_short Comparison of discrete transforms for deep‐neural‐networks‐based speech enhancement
title_sort comparison of discrete transforms for deep neural networks based speech enhancement
topic discrete transforms
feedforward neural nets
speech enhancement
url https://doi.org/10.1049/sil2.12109
work_keys_str_mv AT wissamajassim comparisonofdiscretetransformsfordeepneuralnetworksbasedspeechenhancement
AT naomiharte comparisonofdiscretetransformsfordeepneuralnetworksbasedspeechenhancement