Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer
Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
middle technical university
2023-03-01
|
Series: | Journal of Techniques |
Subjects: | |
Online Access: | https://journal.mtu.edu.iq/index.php/MTU/article/view/749 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595110174392320 |
---|---|
author | Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni |
author_facet | Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni |
author_sort | Mohanad Sameer |
collection | DOAJ |
description |
Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech. This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model.
|
format | Article |
id | doaj-art-45ef8eb4b7b24186a9cb668e48c3e326 |
institution | Kabale University |
issn | 1818-653X 2708-8383 |
language | English |
publishDate | 2023-03-01 |
publisher | middle technical university |
record_format | Article |
series | Journal of Techniques |
spelling | doaj-art-45ef8eb4b7b24186a9cb668e48c3e3262025-01-19T11:02:00Zengmiddle technical universityJournal of Techniques1818-653X2708-83832023-03-015110.51173/jt.v5i1.749Arabic Speech Recognition Based on Encoder-Decoder Architecture of TransformerMohanad Sameer0Ahmed Talib1Alla Hussein2Husniza Husni3Technical College of management - Baghdad, Middle Technical University, Baghdad, Iraq.Technical College of management - Baghdad, Middle Technical University, Baghdad, Iraq.Technical Institute / Kut, Middle Technical University, Baghdad, IraqUniversiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia Recognizing and transcribing human speech has become an increasingly important task. Recently, researchers have been more interested in automatic speech recognition (ASR) using End to End models. Previous choices for the Arabic ASR architecture have been time-delay neural networks, recurrent neural networks (RNN), and long short-term memory (LSTM). Preview end-to-end approaches have suffered from slow training and inference speed because of the limitations of training parallelization, and they require a large amount of data to achieve acceptable results in recognizing Arabic speech. This research presents an Arabic speech recognition based on a transformer encoder-decoder architecture with self-attention to transcribe Arabic audio speech segments into text, which can be trained faster with more efficiency. The proposed model exceeds the performance of previous end-to-end approaches when utilizing the Common Voice dataset from Mozilla. In this research, we introduced a speech-transformer model that was trained over 110 epochs using only 112 hours of speech. Although Arabic is considered one of the languages that are difficult to interpret by speech recognition systems, we achieved the best word error rate (WER) of 3.2 compared to other systems whose training requires a very large amount of data. The proposed system was evaluated on the common voice 8.0 dataset without using the language model. https://journal.mtu.edu.iq/index.php/MTU/article/view/749Sequence to Sequence ASRArabic ASRTransformer-Speech RecognitionArabic Speech to Text |
spellingShingle | Mohanad Sameer Ahmed Talib Alla Hussein Husniza Husni Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer Journal of Techniques Sequence to Sequence ASR Arabic ASR Transformer-Speech Recognition Arabic Speech to Text |
title | Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer |
title_full | Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer |
title_fullStr | Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer |
title_full_unstemmed | Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer |
title_short | Arabic Speech Recognition Based on Encoder-Decoder Architecture of Transformer |
title_sort | arabic speech recognition based on encoder decoder architecture of transformer |
topic | Sequence to Sequence ASR Arabic ASR Transformer-Speech Recognition Arabic Speech to Text |
url | https://journal.mtu.edu.iq/index.php/MTU/article/view/749 |
work_keys_str_mv | AT mohanadsameer arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT ahmedtalib arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT allahussein arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer AT husnizahusni arabicspeechrecognitionbasedonencoderdecoderarchitectureoftransformer |