Dual-Channel Deepfake Audio Detection: Leveraging Direct and Reverberant Waveforms

Deepfake content-including audio, video, images, and text-synthesized or modified using artificial intelligence is designed to convincingly mimic real content. As deepfake generation technology advances, detecting deepfake content presents significant challenges. While recent progress has been made...

Full description

Saved in:
Bibliographic Details
Main Authors: Gunwoo Lee, Jungmin Lee, Minkyo Jung, Joseph Lee, Kihun Hong, Souhwan Jung, Yoseob Han
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10849546/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deepfake content-including audio, video, images, and text-synthesized or modified using artificial intelligence is designed to convincingly mimic real content. As deepfake generation technology advances, detecting deepfake content presents significant challenges. While recent progress has been made in detection techniques, identifying deepfake audio remains particularly challenging. Previous approaches have attempted to capture deepfake features by combining video and audio content; however, these methods are ineffective when video and audio are mismatched due to occlusion. To address this, we propose a novel dual-channel deepfake audio detection model that leverages the direct and reverberant components extracted from raw audio signals, focusing exclusively on audio-based detection without reliance on video content. Across various datasets, including ASVspoof2019, FakeAVCeleb, and sport press conference datasets collected by our group, the proposed dual-channel model demonstrates significant improvements in quantitative metrics such as equal error rate and area under the curve. The implementation is available at <uri>https://github.com/gunwoo5034/Dual-Channel-Audio-Deepfake-Detection</uri>.
ISSN:2169-3536