Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer

Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mohanad Sameer, Ahmed Talib, Alla Hussein, Husniza Husni
Format:	Article
Language:	English
Published:	middle technical university 2023-03-01
Series:	Journal of Techniques
Subjects:	Sequence to Sequence ASR Arabic ASR Transformer-Speech Recognition Arabic Speech to Text
Online Access:	https://journal.mtu.edu.iq/index.php/MTU/article/view/749
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832595110174392320
author	Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni
author_facet	Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni
author_sort	Mohanad Sameer
collection	DOAJ
description	Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech. This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model.
format	Article
id	doaj-art-45ef8eb4b7b24186a9cb668e48c3e326
institution	Kabale University
issn	1818-653X 2708-8383
language	English
publishDate	2023-03-01
publisher	middle technical university
record_format	Article
series	Journal of Techniques
spelling	doaj-art-45ef8eb4b7b24186a9cb668e48c3e3262025-01-19T11:02:00Zengmiddle technical universityJournal of Techniques1818-653X2708-83832023-03-015110.51173/jt.v5i1.749Arabic Speech Recognition Based on Encoder-Decoder Architecture of TransformerMohanad Sameer0Ahmed Talib1Alla Hussein2Husniza Husni3Technical College of management - Baghdad, Middle Technical University, Baghdad, Iraq.Technical College of management - Baghdad, Middle Technical University, Baghdad, Iraq.Technical Institute / Kut, Middle Technical University, Baghdad, IraqUniversiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech. This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model. https://journal.mtu.edu.iq/index.php/MTU/article/view/749Sequence to Sequence ASRArabic ASRTransformer-Speech RecognitionArabic Speech to Text
spellingShingle	Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer Journal of Techniques Sequence to Sequence ASR Arabic ASR Transformer-Speech Recognition Arabic Speech to Text
title	Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
title_full	Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
title_fullStr	Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
title_full_unstemmed	Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
title_short	Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
title_sort	arabic speech recognition based on encoder decoder architecture of transformer
topic	Sequence to Sequence ASR Arabic ASR Transformer-Speech Recognition Arabic Speech to Text
url	https://journal.mtu.edu.iq/index.php/MTU/article/view/749
work_keys_str_mv	AT mohanadsameer arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT ahmedtalib arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT allahussein arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT husnizahusni arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer

Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer

Similar Items