Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition

Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Ind...

Full description

Saved in:
Bibliographic Details
Main Authors: Taufik Fuadi Abidin, Alim Misbullah, Ridha Ferdhiana, Laina Farsiah, Muammar Zikri Aksana, Hammam Riza
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Applied Computational Intelligence and Soft Computing
Online Access:http://dx.doi.org/10.1155/2022/3227828
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832563282414665728
author Taufik Fuadi Abidin
Alim Misbullah
Ridha Ferdhiana
Laina Farsiah
Muammar Zikri Aksana
Hammam Riza
author_facet Taufik Fuadi Abidin
Alim Misbullah
Ridha Ferdhiana
Laina Farsiah
Muammar Zikri Aksana
Hammam Riza
author_sort Taufik Fuadi Abidin
collection DOAJ
description Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively.
format Article
id doaj-art-389dd68431f54c81a8ab59b56dcbf6dd
institution Kabale University
issn 1687-9732
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Applied Computational Intelligence and Soft Computing
spelling doaj-art-389dd68431f54c81a8ab59b56dcbf6dd2025-02-03T01:20:35ZengWileyApplied Computational Intelligence and Soft Computing1687-97322022-01-01202210.1155/2022/3227828Acoustic Model with Multiple Lexicon Types for Indonesian Speech RecognitionTaufik Fuadi Abidin0Alim Misbullah1Ridha Ferdhiana2Laina Farsiah3Muammar Zikri Aksana4Hammam Riza5Department of InformaticsDepartment of InformaticsDepartment of StatisticsDepartment of InformaticsDepartment of InformaticsNational Research and Innovation AgencyCurrently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively.http://dx.doi.org/10.1155/2022/3227828
spellingShingle Taufik Fuadi Abidin
Alim Misbullah
Ridha Ferdhiana
Laina Farsiah
Muammar Zikri Aksana
Hammam Riza
Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
Applied Computational Intelligence and Soft Computing
title Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_full Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_fullStr Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_full_unstemmed Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_short Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_sort acoustic model with multiple lexicon types for indonesian speech recognition
url http://dx.doi.org/10.1155/2022/3227828
work_keys_str_mv AT taufikfuadiabidin acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition
AT alimmisbullah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition
AT ridhaferdhiana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition
AT lainafarsiah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition
AT muammarzikriaksana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition
AT hammamriza acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition