Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss fun...

Full description

Saved in:
Bibliographic Details
Main Authors: Matin Pashaian, Sanaz Seyedin
Format: Article
Language:English
Published: Wiley 2024-01-01
Series:IET Signal Processing
Online Access:http://dx.doi.org/10.1049/2024/8881007
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832559552049971200
author Matin Pashaian
Sanaz Seyedin
author_facet Matin Pashaian
Sanaz Seyedin
author_sort Matin Pashaian
collection DOAJ
description We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (LFD) and a multi-objective MSE loss function LMO. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term LMO is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process. LMO is used beside LFD to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.
format Article
id doaj-art-49fa612dc9604bd59744415ac6dfb9ed
institution Kabale University
issn 1751-9683
language English
publishDate 2024-01-01
publisher Wiley
record_format Article
series IET Signal Processing
spelling doaj-art-49fa612dc9604bd59744415ac6dfb9ed2025-02-03T01:29:50ZengWileyIET Signal Processing1751-96832024-01-01202410.1049/2024/8881007Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss FunctionMatin Pashaian0Sanaz Seyedin1Speech Processing Research LabSpeech Processing Research LabWe propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (LFD) and a multi-objective MSE loss function LMO. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term LMO is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process. LMO is used beside LFD to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.http://dx.doi.org/10.1049/2024/8881007
spellingShingle Matin Pashaian
Sanaz Seyedin
Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
IET Signal Processing
title Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_full Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_fullStr Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_full_unstemmed Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_short Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_sort speech enhancement using joint dnn nmf model learned with multi objective frequency differential spectrum loss function
url http://dx.doi.org/10.1049/2024/8881007
work_keys_str_mv AT matinpashaian speechenhancementusingjointdnnnmfmodellearnedwithmultiobjectivefrequencydifferentialspectrumlossfunction
AT sanazseyedin speechenhancementusingjointdnnnmfmodellearnedwithmultiobjectivefrequencydifferentialspectrumlossfunction