A Spoofing Speech Detection Method Combining Multi-Scale Features and Cross-Layer Information

Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-la...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hongyan Yuan, Linjuan Zhang, Baoning Niu, Xianrong Zheng
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Information
Subjects:	spoofing speech detection misuse of generative AI multi-scale feature pre-trained models adapters cross adaptable weight
Online Access:	https://www.mdpi.com/2078-2489/16/3/194
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-layer transformer structure in pre-trained speech models can effectively capture temporal information and global context in speech, but there is still room for improvement in handling local features. To address this issue, a speech spoofing detection method that integrates multi-scale features and cross-layer information is proposed. The method introduces a multi-scale feature adapter (MSFA), which enhances the model’s ability to perceive local features through residual convolutional blocks and squeeze-and-excitation (SE) mechanisms. Additionally, cross-adaptable weights (CAWs) are used to guide the model in focusing on task-relevant shallow information, thereby enabling the effective fusion of features from different layers of the pre-trained model. Experimental results show that the proposed method achieved an equal error rate (EER) of 0.36% and 4.29% on the ASVspoof2019 logical access (LA) and ASVspoof2021 LA datasets, respectively, demonstrating excellent detection performance and generalization ability.
ISSN:	2078-2489

A Spoofing Speech Detection Method Combining Multi-Scale Features and Cross-Layer Information

Similar Items