A Spoofing Speech Detection Method Combining Multi-Scale Features and Cross-Layer Information
Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-la...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Information |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2078-2489/16/3/194 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Pre-trained self-supervised speech models can extract general acoustic features, providing feature inputs for various speech downstream tasks. Spoofing speech detection, which is a pressing issue in the age of generative AI, requires both global information and local features of speech. The multi-layer transformer structure in pre-trained speech models can effectively capture temporal information and global context in speech, but there is still room for improvement in handling local features. To address this issue, a speech spoofing detection method that integrates multi-scale features and cross-layer information is proposed. The method introduces a multi-scale feature adapter (MSFA), which enhances the model’s ability to perceive local features through residual convolutional blocks and squeeze-and-excitation (SE) mechanisms. Additionally, cross-adaptable weights (CAWs) are used to guide the model in focusing on task-relevant shallow information, thereby enabling the effective fusion of features from different layers of the pre-trained model. Experimental results show that the proposed method achieved an equal error rate (EER) of 0.36% and 4.29% on the ASVspoof2019 logical access (LA) and ASVspoof2021 LA datasets, respectively, demonstrating excellent detection performance and generalization ability. |
|---|---|
| ISSN: | 2078-2489 |