On the Development of Speech Resources for the Mixtec Language
The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries wh...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2013-01-01
|
Series: | The Scientific World Journal |
Online Access: | http://dx.doi.org/10.1155/2013/170649 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language
is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable
as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent
for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the
following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native
speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech
recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator.
The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application
which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). |
---|---|
ISSN: | 1537-744X |