A Spoofing Speech Detection Method Combining Multi-Scale Features and Cross-Layer Information

Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-la...

Full description

Saved in:
Bibliographic Details
Main Authors: Hongyan Yuan, Linjuan Zhang, Baoning Niu, Xianrong Zheng
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/3/194
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-layer transformer structure in pre-trained speech models can effectively capture temporal information and global context in speech, but there is still room for improvement in handling local features. To address this issue, a speech spoofing detection method that integrates multi-scale features and cross-layer information is proposed. The method introduces a multi-scale feature adapter (MSFA), which enhances the model’s ability to perceive local features through residual convolutional blocks and squeeze-and-excitation (SE) mechanisms. Additionally, cross-adaptable weights (CAWs) are used to guide the model in focusing on task-relevant shallow information, thereby enabling the effective fusion of features from different layers of the pre-trained model. Experimental results show that the proposed method achieved an equal error rate (EER) of 0.36% and 4.29% on the ASVspoof2019 logical access (LA) and ASVspoof2021 LA datasets, respectively, demonstrating excellent detection performance and generalization ability.
ISSN:2078-2489