Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss fun...

Full description

Saved in:

Bibliographic Details
Main Authors:	Matin Pashaian, Sanaz Seyedin
Format:	Article
Language:	English
Published:	Wiley 2024-01-01
Series:	IET Signal Processing
Online Access:	http://dx.doi.org/10.1049/2024/8881007
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832559552049971200
author	Matin Pashaian Sanaz Seyedin
author_facet	Matin Pashaian Sanaz Seyedin
author_sort	Matin Pashaian
collection	DOAJ
description	We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (LFD) and a multi-objective MSE loss function LMO. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term LMO is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process. LMO is used beside LFD to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.
format	Article
id	doaj-art-49fa612dc9604bd59744415ac6dfb9ed
institution	Kabale University
issn	1751-9683
language	English
publishDate	2024-01-01
publisher	Wiley
record_format	Article
series	IET Signal Processing
spelling	doaj-art-49fa612dc9604bd59744415ac6dfb9ed2025-02-03T01:29:50ZengWileyIET Signal Processing1751-96832024-01-01202410.1049/2024/8881007Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss FunctionMatin Pashaian0Sanaz Seyedin1Speech Processing Research LabSpeech Processing Research LabWe propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (LMOFD) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (LFD) and a multi-objective MSE loss function LMO. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term LMO is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process. LMO is used beside LFD to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.http://dx.doi.org/10.1049/2024/8881007
spellingShingle	Matin Pashaian Sanaz Seyedin Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function IET Signal Processing
title	Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_full	Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_fullStr	Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_full_unstemmed	Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_short	Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function
title_sort	speech enhancement using joint dnn nmf model learned with multi objective frequency differential spectrum loss function
url	http://dx.doi.org/10.1049/2024/8881007
work_keys_str_mv	AT matinpashaian speechenhancementusingjointdnnnmfmodellearnedwithmultiobjectivefrequencydifferentialspectrumlossfunction AT sanazseyedin speechenhancementusingjointdnnnmfmodellearnedwithmultiobjectivefrequencydifferentialspectrumlossfunction

Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

Similar Items