DDFNet: A Dual-Domain Fusion Network for Robust Synthetic Speech Detection
The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis al...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-03-01
|
| Series: | Big Data and Cognitive Computing |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2504-2289/9/3/58 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The detection of synthetic speech has become a pressing challenge due to the potential societal risks posed by synthetic speech technologies. Existing methods primarily focus on either the time or frequency domain of speech, limiting their ability to generalize to new and diverse speech synthesis algorithms. In this work, we present a novel and scientifically grounded approach, the Dual-domain Fusion Network (DDFNet), which synergistically integrates features from both the time and frequency domains to capture complementary information. The architecture consists of two specialized single-domain feature extraction networks, each optimized for the unique characteristics of its respective domain, and a feature fusion network that effectively combines these features at a deep level. Moreover, we incorporate multi-task learning to simultaneously capture rich, multi-faceted representations, further enhancing the model’s generalization capability. Extensive experiments on the ASVspoof 2019 Logical Access corpus and ASVspoof 2021 tracks demonstrate that DDFNet achieves strong performance, maintaining competitive results despite the challenges posed by channel changes and compression coding, highlighting its robust generalization ability. |
|---|---|
| ISSN: | 2504-2289 |