A Hybrid Deep Learning-Machine Learning Stacking Model for Yemeni Arabic Dialect Sentiment Analysis
With the rise of online communities, Yemeni Arabic has gained increasing exposure to written social media content. Nevertheless, sentiment analysis studies have largely centered on Modern Standard Arabic (MSA) and other regional varieties (e.g., Egyptian, Levantine, Gulf), leaving the Yemeni dialect...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11097878/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | With the rise of online communities, Yemeni Arabic has gained increasing exposure to written social media content. Nevertheless, sentiment analysis studies have largely centered on Modern Standard Arabic (MSA) and other regional varieties (e.g., Egyptian, Levantine, Gulf), leaving the Yemeni dialect understudied, with insufficient specialized resources. This study bridges this gap by introducing the largest sentiment-annotated corpus for the Yemeni Arabic dialect, consisting of 45,862 manually labeled Facebook comments collected from the Facebook pages of the main telecommunications companies in Yemen (Yemen Telecom, Yemen Mobile, YOU, and Sabafon). These comments comprise user feedback on public service matters related to these companies. Multiple reviewers annotated the data to ensure reliability, with disagreements resolved through discussion. Moreover, we constructed a novel Yemeni dialect sentiment lexicon, classifying words/phrases according to polarity. We systematically evaluate sentiment analysis approaches, including lexicon-based, machine learning, deep learning, and hybrid, in terms of user sentiment regarding Yemeni telecom services. This study implemented different advanced deep-learning models in our dataset to demonstrate superior performance compared to other methods. The experimental results indicate a hybrid approach that combines advanced deep learning models with a stacked approach, leveraging deep networks for hierarchical feature extraction and RF/LinearSVC as meta-learners, achieving the highest accuracies of 94.71% and 94.28%, respectively. This study provides a preliminary reference for sentiment analysis in low-resource Arabic dialects such as Yemeni. It highlights the effectiveness of stacked feature engineering and meta-learning in dialectal NLP tasks. |
|---|---|
| ISSN: | 2169-3536 |