A Comparative Study of Some Automatic Arabic Text Diacritization Systems

Arabic diacritization is the task of restoring diacritics or vowels for Arabic texts considering that they are mostly written without them. This task, when automated, shows better results for some natural language processing tasks; hence, it is necessary for the field of Arabic language processing....

Full description

Saved in:
Bibliographic Details
Main Authors: Ali Mijlad, Yacine El Younoussi
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Advances in Human-Computer Interaction
Online Access:http://dx.doi.org/10.1155/2022/3613710
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832559428901011456
author Ali Mijlad
Yacine El Younoussi
author_facet Ali Mijlad
Yacine El Younoussi
author_sort Ali Mijlad
collection DOAJ
description Arabic diacritization is the task of restoring diacritics or vowels for Arabic texts considering that they are mostly written without them. This task, when automated, shows better results for some natural language processing tasks; hence, it is necessary for the field of Arabic language processing. In this paper, we are going to present a comparative study of some automatic diacritization systems. One uses a variant of the hidden Markov model. The other one is a pipeline, which includes a Long Short-Term Memory deep learning model, a rule-based correction component, and a statistical-based component. Additionally, we are proposing some modifications to those systems. We have trained and tested those systems in the same benchmark dataset based on the same evaluation metrics proposed in previous work. The best system results are 9.42% and 22.82% for the diacritic error rate DER and the word error rate WER, respectively.
format Article
id doaj-art-8f1f35bd0eaf4c348f909da67db2009c
institution Kabale University
issn 1687-5907
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Advances in Human-Computer Interaction
spelling doaj-art-8f1f35bd0eaf4c348f909da67db2009c2025-02-03T01:30:03ZengWileyAdvances in Human-Computer Interaction1687-59072022-01-01202210.1155/2022/3613710A Comparative Study of Some Automatic Arabic Text Diacritization SystemsAli Mijlad0Yacine El Younoussi1SIGL LaboratorySIGL LaboratoryArabic diacritization is the task of restoring diacritics or vowels for Arabic texts considering that they are mostly written without them. This task, when automated, shows better results for some natural language processing tasks; hence, it is necessary for the field of Arabic language processing. In this paper, we are going to present a comparative study of some automatic diacritization systems. One uses a variant of the hidden Markov model. The other one is a pipeline, which includes a Long Short-Term Memory deep learning model, a rule-based correction component, and a statistical-based component. Additionally, we are proposing some modifications to those systems. We have trained and tested those systems in the same benchmark dataset based on the same evaluation metrics proposed in previous work. The best system results are 9.42% and 22.82% for the diacritic error rate DER and the word error rate WER, respectively.http://dx.doi.org/10.1155/2022/3613710
spellingShingle Ali Mijlad
Yacine El Younoussi
A Comparative Study of Some Automatic Arabic Text Diacritization Systems
Advances in Human-Computer Interaction
title A Comparative Study of Some Automatic Arabic Text Diacritization Systems
title_full A Comparative Study of Some Automatic Arabic Text Diacritization Systems
title_fullStr A Comparative Study of Some Automatic Arabic Text Diacritization Systems
title_full_unstemmed A Comparative Study of Some Automatic Arabic Text Diacritization Systems
title_short A Comparative Study of Some Automatic Arabic Text Diacritization Systems
title_sort comparative study of some automatic arabic text diacritization systems
url http://dx.doi.org/10.1155/2022/3613710
work_keys_str_mv AT alimijlad acomparativestudyofsomeautomaticarabictextdiacritizationsystems
AT yacineelyounoussi acomparativestudyofsomeautomaticarabictextdiacritizationsystems
AT alimijlad comparativestudyofsomeautomaticarabictextdiacritizationsystems
AT yacineelyounoussi comparativestudyofsomeautomaticarabictextdiacritizationsystems