A Hybrid Deep Learning-Machine Learning Stacking Model for Yemeni Arabic Dialect Sentiment Analysis

With the rise of online communities, Yemeni Arabic has gained increasing exposure to written social media content. Nevertheless, sentiment analysis studies have largely centered on Modern Standard Arabic (MSA) and other regional varieties (e.g., Egyptian, Levantine, Gulf), leaving the Yemeni dialect...

Full description

Saved in:
Bibliographic Details
Main Authors: Alaa Abdulkareem Hameed Brihi, Mossa Ghurab
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11097878/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the rise of online communities, Yemeni Arabic has gained increasing exposure to written social media content. Nevertheless, sentiment analysis studies have largely centered on Modern Standard Arabic (MSA) and other regional varieties (e.g., Egyptian, Levantine, Gulf), leaving the Yemeni dialect understudied, with insufficient specialized resources. This study bridges this gap by introducing the largest sentiment-annotated corpus for the Yemeni Arabic dialect, consisting of 45,862 manually labeled Facebook comments collected from the Facebook pages of the main telecommunications companies in Yemen (Yemen Telecom, Yemen Mobile, YOU, and Sabafon). These comments comprise user feedback on public service matters related to these companies. Multiple reviewers annotated the data to ensure reliability, with disagreements resolved through discussion. Moreover, we constructed a novel Yemeni dialect sentiment lexicon, classifying words/phrases according to polarity. We systematically evaluate sentiment analysis approaches, including lexicon-based, machine learning, deep learning, and hybrid, in terms of user sentiment regarding Yemeni telecom services. This study implemented different advanced deep-learning models in our dataset to demonstrate superior performance compared to other methods. The experimental results indicate a hybrid approach that combines advanced deep learning models with a stacked approach, leveraging deep networks for hierarchical feature extraction and RF/LinearSVC as meta-learners, achieving the highest accuracies of 94.71% and 94.28%, respectively. This study provides a preliminary reference for sentiment analysis in low-resource Arabic dialects such as Yemeni. It highlights the effectiveness of stacked feature engineering and meta-learning in dialectal NLP tasks.
ISSN:2169-3536