ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
Speech-to-speech translation (S2ST) has emerged as a practical solution for overcoming linguistic barriers, enabling direct translation between spoken languages without relying on intermediate text representations. However, existing S2ST systems face significant challenges, including the requirement...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10833610/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech-to-speech translation (S2ST) has emerged as a practical solution for overcoming linguistic barriers, enabling direct translation between spoken languages without relying on intermediate text representations. However, existing S2ST systems face significant challenges, including the requirement for extensive parallel speech data and the limitations of known written languages. This paper proposes ZeST, a novel zero-resourced approach to speech-to-speech translation that addresses the challenges of processing unknown, unpaired, and untranscribed languages. ZeST consists of two main phases: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> Discovering semantically related speech pairs from unpaired data by leveraging self-supervised visually grounded speech (VGS) models and <xref ref-type="disp-formula" rid="deqn2">(2)</xref> Achieving textless speech-to-speech translation for untranscribed languages using discrete speech representations and sequence-to-sequence modeling. Experimental evaluations using three different data scenarios demonstrate that the ZeST system effectively performs direct speech-to-speech translation without relying on transcribed data or parallel corpora. The experimental results highlight the potential of ZeST in contributing to the field of zero-resourced speech processing and improving communication in multilingual societies. |
---|---|
ISSN: | 2169-3536 |