Three-stage hybrid spiking neural networks fine-tuning for speech enhancement

IntroductionIn the past decade, artificial neural networks (ANNs) have revolutionized many AI-related fields, including Speech Enhancement (SE). However, achieving high performance with ANNs often requires substantial power and memory resources. Recently, spiking neural networks (SNNs) have emerged...

Full description

Saved in:
Bibliographic Details
Main Authors: Nidal Abuhajar, Zhewei Wang, Marc Baltes, Ye Yue, Li Xu, Avinash Karanth, Charles D. Smith, Jundong Liu
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-04-01
Series:Frontiers in Neuroscience
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fnins.2025.1567347/full
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:IntroductionIn the past decade, artificial neural networks (ANNs) have revolutionized many AI-related fields, including Speech Enhancement (SE). However, achieving high performance with ANNs often requires substantial power and memory resources. Recently, spiking neural networks (SNNs) have emerged as a promising low-power alternative to ANNs, leveraging their inherent sparsity to enable efficient computation while maintaining performance.MethodWhile SNNs offer improved energy efficiency, they are generally more challenging to train compared to ANNs. In this study, we propose a three-stage hybrid ANN-to-SNN fine-tuning scheme and apply it to Wave-U-Net and ConvTasNet, two major network solutions for speech enhancement. Our framework first trains the ANN models, followed by converting them into their corresponding spiking versions. The converted SNNs are subsequently fine-tuned with a hybrid training scheme, where the forward pass uses spiking signals and the backward pass uses ANN signals to enable backpropagation. In order to maintain the performance of the original ANN models, various modifications to the original network architectures have been made. Our SNN models operate entirely in the temporal domain, eliminating the need to convert wave signals into the spectral domain for input and back to the waveform for output. Moreover, our models uniquely utilize spiking neurons, setting them apart from many models that incorporate regular ANN neurons in their architectures.Results and discussionExperiments on noisy VCTK and TIMIT datasets demonstrate the effectiveness of the hybrid training, where the fine-tuned SNNs show significant improvement and robustness over the baseline models.
ISSN:1662-453X