ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages

Speech-to-speech translation (S2ST) has emerged as a practical solution for overcoming linguistic barriers, enabling direct translation between spoken languages without relying on intermediate text representations. However, existing S2ST systems face significant challenges, including the requirement...

Full description

Saved in:

Bibliographic Details
Main Authors:	Luan Thanh Nguyen, Sakriani Sakti
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Speech-to-speech translation self-supervised speech representation zero-resourced
Online Access:	https://ieeexplore.ieee.org/document/10833610/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592889334464512
author	Luan Thanh Nguyen Sakriani Sakti
author_facet	Luan Thanh Nguyen Sakriani Sakti
author_sort	Luan Thanh Nguyen
collection	DOAJ
description	Speech-to-speech translation (S2ST) has emerged as a practical solution for overcoming linguistic barriers, enabling direct translation between spoken languages without relying on intermediate text representations. However, existing S2ST systems face significant challenges, including the requirement for extensive parallel speech data and the limitations of known written languages. This paper proposes ZeST, a novel zero-resourced approach to speech-to-speech translation that addresses the challenges of processing unknown, unpaired, and untranscribed languages. ZeST consists of two main phases: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> Discovering semantically related speech pairs from unpaired data by leveraging self-supervised visually grounded speech (VGS) models and <xref ref-type="disp-formula" rid="deqn2">(2)</xref> Achieving textless speech-to-speech translation for untranscribed languages using discrete speech representations and sequence-to-sequence modeling. Experimental evaluations using three different data scenarios demonstrate that the ZeST system effectively performs direct speech-to-speech translation without relying on transcribed data or parallel corpora. The experimental results highlight the potential of ZeST in contributing to the field of zero-resourced speech processing and improving communication in multilingual societies.
format	Article
id	doaj-art-c4da6a95e484428b8165a874876c720e
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-c4da6a95e484428b8165a874876c720e2025-01-21T00:02:13ZengIEEEIEEE Access2169-35362025-01-01138638864810.1109/ACCESS.2025.352701210833610ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed LanguagesLuan Thanh Nguyen0https://orcid.org/0000-0003-4882-8336Sakriani Sakti1https://orcid.org/0000-0001-5509-8963Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, JapanJapan Advanced Institute of Science and Technology, Nomi, Ishikawa, JapanSpeech-to-speech translation (S2ST) has emerged as a practical solution for overcoming linguistic barriers, enabling direct translation between spoken languages without relying on intermediate text representations. However, existing S2ST systems face significant challenges, including the requirement for extensive parallel speech data and the limitations of known written languages. This paper proposes ZeST, a novel zero-resourced approach to speech-to-speech translation that addresses the challenges of processing unknown, unpaired, and untranscribed languages. ZeST consists of two main phases: <xref ref-type="disp-formula" rid="deqn1">(1)</xref> Discovering semantically related speech pairs from unpaired data by leveraging self-supervised visually grounded speech (VGS) models and <xref ref-type="disp-formula" rid="deqn2">(2)</xref> Achieving textless speech-to-speech translation for untranscribed languages using discrete speech representations and sequence-to-sequence modeling. Experimental evaluations using three different data scenarios demonstrate that the ZeST system effectively performs direct speech-to-speech translation without relying on transcribed data or parallel corpora. The experimental results highlight the potential of ZeST in contributing to the field of zero-resourced speech processing and improving communication in multilingual societies.https://ieeexplore.ieee.org/document/10833610/Speech-to-speech translationself-supervised speech representationzero-resourced
spellingShingle	Luan Thanh Nguyen Sakriani Sakti ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages IEEE Access Speech-to-speech translation self-supervised speech representation zero-resourced
title	ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
title_full	ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
title_fullStr	ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
title_full_unstemmed	ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
title_short	ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages
title_sort	zest a zero resourced speech to speech translation approach for unknown unpaired and untranscribed languages
topic	Speech-to-speech translation self-supervised speech representation zero-resourced
url	https://ieeexplore.ieee.org/document/10833610/
work_keys_str_mv	AT luanthanhnguyen zestazeroresourcedspeechtospeechtranslationapproachforunknownunpairedanduntranscribedlanguages AT sakrianisakti zestazeroresourcedspeechtospeechtranslationapproachforunknownunpairedanduntranscribedlanguages

ZeST: A Zero-Resourced Speech-to-Speech Translation Approach for Unknown, Unpaired, and Untranscribed Languages

Similar Items