Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks

Abstract Deep neural network-based direction of arrival (DOA) estimation systems often rely on spatial features as input to learn a mapping for estimating the DOA of multiple talkers. Aiming to improve the accuracy of multi-talker DOA estimation for binaural hearing aids with a known number of activ...

Full description

Saved in:

Bibliographic Details
Main Authors:	Reza Varzandeh, Simon Doclo, Volker Hohmann
Format:	Article
Language:	English
Published:	SpringerOpen 2025-02-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Convolutional neural networks Spatial feature Periodicity feature Binaural DOA estimation Multiple talkers Feature reduction
Online Access:	https://doi.org/10.1186/s13636-025-00392-8
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571467656593408
author	Reza Varzandeh Simon Doclo Volker Hohmann
author_facet	Reza Varzandeh Simon Doclo Volker Hohmann
author_sort	Reza Varzandeh
collection	DOAJ
description	Abstract Deep neural network-based direction of arrival (DOA) estimation systems often rely on spatial features as input to learn a mapping for estimating the DOA of multiple talkers. Aiming to improve the accuracy of multi-talker DOA estimation for binaural hearing aids with a known number of active talkers, we investigate the usage of periodicity features as a footprint of speech signals in combination with spatial features as input to a convolutional neural network (CNN). In particular, we propose a multi-talker DOA estimation system employing a two-stage CNN architecture that utilizes cross-power spectrum (CPS) phase as spatial features and an auditory-inspired periodicity feature called periodicity degree (PD) as spectral features. The two-stage CNN incorporates a PD feature reduction stage prior to the joint processing of PD and CPS phase features. We investigate different design choices for the CNN architecture, including varying temporal reduction strategies and spectro-temporal filtering approaches. The performance of the proposed system is evaluated in static source scenarios with 2–3 talkers in two reverberant environments under varying signal-to-noise ratios using recorded background noises. To evaluate the benefit of combining PD features with CPS phase features, we consider baseline systems that utilize either only CPS phase features or combine CPS phase and magnitude spectrogram features. Results show that combining PD and CPS phase features in the proposed system consistently improves DOA estimation accuracy across all conditions, outperforming the two baseline systems. Additionally, the PD feature reduction stage in the proposed system improves DOA estimation accuracy while significantly reducing computational complexity compared to a baseline system without this stage, demonstrating its effectiveness for multi-talker DOA estimation.
format	Article
id	doaj-art-a9f41b4ae23c4f23bdbc592a377c4602
institution	Kabale University
issn	1687-4722
language	English
publishDate	2025-02-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Audio, Speech, and Music Processing
spelling	doaj-art-a9f41b4ae23c4f23bdbc592a377c46022025-02-02T12:35:42ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-02-012025111810.1186/s13636-025-00392-8Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networksReza Varzandeh0Simon Doclo1Volker Hohmann2Department of Medical Physics and Acoustics and the Cluster of Excellence Hearing4all, University of OldenburgDepartment of Medical Physics and Acoustics and the Cluster of Excellence Hearing4all, University of OldenburgDepartment of Medical Physics and Acoustics and the Cluster of Excellence Hearing4all, University of OldenburgAbstract Deep neural network-based direction of arrival (DOA) estimation systems often rely on spatial features as input to learn a mapping for estimating the DOA of multiple talkers. Aiming to improve the accuracy of multi-talker DOA estimation for binaural hearing aids with a known number of active talkers, we investigate the usage of periodicity features as a footprint of speech signals in combination with spatial features as input to a convolutional neural network (CNN). In particular, we propose a multi-talker DOA estimation system employing a two-stage CNN architecture that utilizes cross-power spectrum (CPS) phase as spatial features and an auditory-inspired periodicity feature called periodicity degree (PD) as spectral features. The two-stage CNN incorporates a PD feature reduction stage prior to the joint processing of PD and CPS phase features. We investigate different design choices for the CNN architecture, including varying temporal reduction strategies and spectro-temporal filtering approaches. The performance of the proposed system is evaluated in static source scenarios with 2–3 talkers in two reverberant environments under varying signal-to-noise ratios using recorded background noises. To evaluate the benefit of combining PD features with CPS phase features, we consider baseline systems that utilize either only CPS phase features or combine CPS phase and magnitude spectrogram features. Results show that combining PD and CPS phase features in the proposed system consistently improves DOA estimation accuracy across all conditions, outperforming the two baseline systems. Additionally, the PD feature reduction stage in the proposed system improves DOA estimation accuracy while significantly reducing computational complexity compared to a baseline system without this stage, demonstrating its effectiveness for multi-talker DOA estimation.https://doi.org/10.1186/s13636-025-00392-8Convolutional neural networksSpatial featurePeriodicity featureBinaural DOA estimationMultiple talkersFeature reduction
spellingShingle	Reza Varzandeh Simon Doclo Volker Hohmann Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks EURASIP Journal on Audio, Speech, and Music Processing Convolutional neural networks Spatial feature Periodicity feature Binaural DOA estimation Multiple talkers Feature reduction
title	Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks
title_full	Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks
title_fullStr	Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks
title_full_unstemmed	Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks
title_short	Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks
title_sort	improving multi talker binaural doa estimation by combining periodicity and spatial features in convolutional neural networks
topic	Convolutional neural networks Spatial feature Periodicity feature Binaural DOA estimation Multiple talkers Feature reduction
url	https://doi.org/10.1186/s13636-025-00392-8
work_keys_str_mv	AT rezavarzandeh improvingmultitalkerbinauraldoaestimationbycombiningperiodicityandspatialfeaturesinconvolutionalneuralnetworks AT simondoclo improvingmultitalkerbinauraldoaestimationbycombiningperiodicityandspatialfeaturesinconvolutionalneuralnetworks AT volkerhohmann improvingmultitalkerbinauraldoaestimationbycombiningperiodicityandspatialfeaturesinconvolutionalneuralnetworks

Improving multi-talker binaural DOA estimation by combining periodicity and spatial features in convolutional neural networks

Similar Items