Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition

Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Ind...

Full description

Saved in:

Bibliographic Details
Main Authors:	Taufik Fuadi Abidin, Alim Misbullah, Ridha Ferdhiana, Laina Farsiah, Muammar Zikri Aksana, Hammam Riza
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	Applied Computational Intelligence and Soft Computing
Online Access:	http://dx.doi.org/10.1155/2022/3227828
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832563282414665728
author	Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza
author_facet	Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza
author_sort	Taufik Fuadi Abidin
collection	DOAJ
description	Currently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively.
format	Article
id	doaj-art-389dd68431f54c81a8ab59b56dcbf6dd
institution	Kabale University
issn	1687-9732
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	Applied Computational Intelligence and Soft Computing
spelling	doaj-art-389dd68431f54c81a8ab59b56dcbf6dd2025-02-03T01:20:35ZengWileyApplied Computational Intelligence and Soft Computing1687-97322022-01-01202210.1155/2022/3227828Acoustic Model with Multiple Lexicon Types for Indonesian Speech RecognitionTaufik Fuadi Abidin0Alim Misbullah1Ridha Ferdhiana2Laina Farsiah3Muammar Zikri Aksana4Hammam Riza5Department of InformaticsDepartment of InformaticsDepartment of StatisticsDepartment of InformaticsDepartment of InformaticsNational Research and Innovation AgencyCurrently, speech recognition datasets are increasingly available freely in various languages. However, speech recognition datasets in the Indonesian language are still challenging to obtain. Consequently, research focusing on speech recognition is challenging to carry out. This research creates Indonesian speech recognition datasets from YouTube channels with subtitles by validating all utterances of downloaded audio to improve the data quality. The quality of the dataset was evaluated using a deep neural network. The time delay neural network (TDNN) was used to build the acoustic model by applying the alignment data from the Gaussian mixture model-hidden Markov model (GMM-HMM). Data augmentation was used to increase the number of validated datasets and enhance the performance of the acoustic model. The results show that the acoustic model built using the validated datasets is better than the unvalidated datasets for all types of lexicons. Utilizing the four lexicon types and increasing the data through augmentation to train the acoustic models can lower the word error rate percentage in the GMM-HMM, TDNN factorization (TDNNF), and CNN-TDNNF-augmented models to 40.85%, 24.96%, and 19.03%, respectively.http://dx.doi.org/10.1155/2022/3227828
spellingShingle	Taufik Fuadi Abidin Alim Misbullah Ridha Ferdhiana Laina Farsiah Muammar Zikri Aksana Hammam Riza Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition Applied Computational Intelligence and Soft Computing
title	Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_full	Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_fullStr	Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_full_unstemmed	Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_short	Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
title_sort	acoustic model with multiple lexicon types for indonesian speech recognition
url	http://dx.doi.org/10.1155/2022/3227828
work_keys_str_mv	AT taufikfuadiabidin acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT alimmisbullah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT ridhaferdhiana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT lainafarsiah acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT muammarzikriaksana acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition AT hammamriza acousticmodelwithmultiplelexicontypesforindonesianspeechrecognition

Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition

Similar Items