Multitask Learning with Local Attention for Tibetan Speech Recognition

In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the a...

Full description

Saved in:
Bibliographic Details
Main Authors: Hui Wang, Fei Gao, Yue Zhao, Li Yang, Jianjian Yue, Huilin Ma
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/8894566
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832546933846048768
author Hui Wang
Fei Gao
Yue Zhao
Li Yang
Jianjian Yue
Huilin Ma
author_facet Hui Wang
Fei Gao
Yue Zhao
Li Yang
Jianjian Yue
Huilin Ma
author_sort Hui Wang
collection DOAJ
description In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.
format Article
id doaj-art-3a20171cf27c44d89e71c13848ccc61f
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2020-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-3a20171cf27c44d89e71c13848ccc61f2025-02-03T06:46:30ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/88945668894566Multitask Learning with Local Attention for Tibetan Speech RecognitionHui Wang0Fei Gao1Yue Zhao2Li Yang3Jianjian Yue4Huilin Ma5School of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaIn this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.http://dx.doi.org/10.1155/2020/8894566
spellingShingle Hui Wang
Fei Gao
Yue Zhao
Li Yang
Jianjian Yue
Huilin Ma
Multitask Learning with Local Attention for Tibetan Speech Recognition
Complexity
title Multitask Learning with Local Attention for Tibetan Speech Recognition
title_full Multitask Learning with Local Attention for Tibetan Speech Recognition
title_fullStr Multitask Learning with Local Attention for Tibetan Speech Recognition
title_full_unstemmed Multitask Learning with Local Attention for Tibetan Speech Recognition
title_short Multitask Learning with Local Attention for Tibetan Speech Recognition
title_sort multitask learning with local attention for tibetan speech recognition
url http://dx.doi.org/10.1155/2020/8894566
work_keys_str_mv AT huiwang multitasklearningwithlocalattentionfortibetanspeechrecognition
AT feigao multitasklearningwithlocalattentionfortibetanspeechrecognition
AT yuezhao multitasklearningwithlocalattentionfortibetanspeechrecognition
AT liyang multitasklearningwithlocalattentionfortibetanspeechrecognition
AT jianjianyue multitasklearningwithlocalattentionfortibetanspeechrecognition
AT huilinma multitasklearningwithlocalattentionfortibetanspeechrecognition