Multitask Learning with Local Attention for Tibetan Speech Recognition
In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the a...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2020/8894566 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832546933846048768 |
---|---|
author | Hui Wang Fei Gao Yue Zhao Li Yang Jianjian Yue Huilin Ma |
author_facet | Hui Wang Fei Gao Yue Zhao Li Yang Jianjian Yue Huilin Ma |
author_sort | Hui Wang |
collection | DOAJ |
description | In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model. |
format | Article |
id | doaj-art-3a20171cf27c44d89e71c13848ccc61f |
institution | Kabale University |
issn | 1076-2787 1099-0526 |
language | English |
publishDate | 2020-01-01 |
publisher | Wiley |
record_format | Article |
series | Complexity |
spelling | doaj-art-3a20171cf27c44d89e71c13848ccc61f2025-02-03T06:46:30ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/88945668894566Multitask Learning with Local Attention for Tibetan Speech RecognitionHui Wang0Fei Gao1Yue Zhao2Li Yang3Jianjian Yue4Huilin Ma5School of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaSchool of Information Engineering, Minzu University of China, Beijing 100081, ChinaIn this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.http://dx.doi.org/10.1155/2020/8894566 |
spellingShingle | Hui Wang Fei Gao Yue Zhao Li Yang Jianjian Yue Huilin Ma Multitask Learning with Local Attention for Tibetan Speech Recognition Complexity |
title | Multitask Learning with Local Attention for Tibetan Speech Recognition |
title_full | Multitask Learning with Local Attention for Tibetan Speech Recognition |
title_fullStr | Multitask Learning with Local Attention for Tibetan Speech Recognition |
title_full_unstemmed | Multitask Learning with Local Attention for Tibetan Speech Recognition |
title_short | Multitask Learning with Local Attention for Tibetan Speech Recognition |
title_sort | multitask learning with local attention for tibetan speech recognition |
url | http://dx.doi.org/10.1155/2020/8894566 |
work_keys_str_mv | AT huiwang multitasklearningwithlocalattentionfortibetanspeechrecognition AT feigao multitasklearningwithlocalattentionfortibetanspeechrecognition AT yuezhao multitasklearningwithlocalattentionfortibetanspeechrecognition AT liyang multitasklearningwithlocalattentionfortibetanspeechrecognition AT jianjianyue multitasklearningwithlocalattentionfortibetanspeechrecognition AT huilinma multitasklearningwithlocalattentionfortibetanspeechrecognition |