Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Ind...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-01-01
|
Series: | Applied Computational Intelligence and Soft Computing |
Online Access: | http://dx.doi.org/10.1155/2022/3227828 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832563282414665728 |
---|---|
author | Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza |
author_facet | Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza |
author_sort | Taufik Fuadi Abidin |
collection | DOAJ |
description | Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively. |
format | Article |
id | doaj-art-389dd68431f54c81a8ab59b56dcbf6dd |
institution | Kabale University |
issn | 1687-9732 |
language | English |
publishDate | 2022-01-01 |
publisher | Wiley |
record_format | Article |
series | Applied Computational Intelligence and Soft Computing |
spelling | doaj-art-389dd68431f54c81a8ab59b56dcbf6dd2025-02-03T01:20:35ZengWileyApplied Computational Intelligence and Soft Computing1687-97322022-01-01202210.1155/2022/3227828Acoustic Model with Multiple Lexicon Types for Indonesian Speech RecognitionTaufik Fuadi Abidin0Alim Misbullah1Ridha Ferdhiana2Laina Farsiah3Muammar Zikri Aksana4Hammam Riza5Department of InformaticsDepartment of InformaticsDepartment of StatisticsDepartment of InformaticsDepartment of InformaticsNational Research and Innovation AgencyCurrently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively.http://dx.doi.org/10.1155/2022/3227828 |
spellingShingle | Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition Applied Computational Intelligence and Soft Computing |
title | Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition |
title_full | Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition |
title_fullStr | Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition |
title_full_unstemmed | Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition |
title_short | Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition |
title_sort | acoustic model with multiple lexicon types for indonesian speech recognition |
url | http://dx.doi.org/10.1155/2022/3227828 |
work_keys_str_mv | AT taufikfuadiabidin acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT alimmisbullah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT ridhaferdhiana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT lainafarsiah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT muammarzikriaksana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT hammamriza acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition |