On the Development of Speech Resources for the Mixtec Language

The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries wh...

Full description

Saved in:
Bibliographic Details
Main Author: Santiago-Omar Caballero-Morales
Format: Article
Language:English
Published: Wiley 2013-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2013/170649
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832566585104007168
author Santiago-Omar Caballero-Morales
author_facet Santiago-Omar Caballero-Morales
author_sort Santiago-Omar Caballero-Morales
collection DOAJ
description The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users).
format Article
id doaj-art-f5ff235852cb422ea851dd3a5be1639c
institution Kabale University
issn 1537-744X
language English
publishDate 2013-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-f5ff235852cb422ea851dd3a5be1639c2025-02-03T01:03:48ZengWileyThe Scientific World Journal1537-744X2013-01-01201310.1155/2013/170649170649On the Development of Speech Resources for the Mixtec LanguageSantiago-Omar Caballero-Morales0Technological University of the Mixteca, Road to Acatlima K.m. 2.5, 69000 Huajuapan de León, OAX, MexicoThe Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users).http://dx.doi.org/10.1155/2013/170649
spellingShingle Santiago-Omar Caballero-Morales
On the Development of Speech Resources for the Mixtec Language
The Scientific World Journal
title On the Development of Speech Resources for the Mixtec Language
title_full On the Development of Speech Resources for the Mixtec Language
title_fullStr On the Development of Speech Resources for the Mixtec Language
title_full_unstemmed On the Development of Speech Resources for the Mixtec Language
title_short On the Development of Speech Resources for the Mixtec Language
title_sort on the development of speech resources for the mixtec language
url http://dx.doi.org/10.1155/2013/170649
work_keys_str_mv AT santiagoomarcaballeromorales onthedevelopmentofspeechresourcesforthemixteclanguage