Echo: A crowd-sourced Romanian speech dataset.

Romanian is the seventh most popular European language, with around 30 million speakers worldwide. Despite its popularity, the available speech resources are limited. As a result, there are few models that transcribe Romanian well, most of them being multilingual models that also cover less pop...

Full description

Saved in:
Bibliographic Details
Main Authors: Remus-Dan Ungureanu, Mihai Dascalu
Format: Article
Language:English
Published: ASLERD 2024-11-01
Series:Interaction Design and Architecture(s)
Online Access:https://ixdea.org/62_9/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Romanian is the seventh most popular European language, with around 30 million speakers worldwide. Despite its popularity, the available speech resources are limited. As a result, there are few models that transcribe Romanian well, most of them being multilingual models that also cover less popular languages. Echo is a crowd-sourcing platform that has collected more than 300 hours of speech from various contributors. In this study, we document how a large speech dataset enables researchers to train automatic speech recognition, speaker verification, and diarization models to automatically process students’ notes. We publicly release both the dataset and the Whisper-based baseline model as open-source.
ISSN:2283-2998