Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content

Detecting hate speech in Arabic social media content is critical for ensuring safe, inclusive, and respectful online communication. However, this task remains challenging due to Arabic’s morphological richness, dialectal variations such as Levantine, and the scarcity of high-quality annot...

Full description

Saved in:
Bibliographic Details
Main Authors: Karim Gasmi, Ibtihel Ben Ltaifa, Alameen Eltoum Abdalrahman, Omer Hamid, Mohamed Othman Altaieb, Shahzad Ali, Lassaad Ben Ammar, Manel Mrabet
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11088089/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850075408545873920
author Karim Gasmi
Ibtihel Ben Ltaifa
Alameen Eltoum Abdalrahman
Omer Hamid
Mohamed Othman Altaieb
Shahzad Ali
Lassaad Ben Ammar
Manel Mrabet
author_facet Karim Gasmi
Ibtihel Ben Ltaifa
Alameen Eltoum Abdalrahman
Omer Hamid
Mohamed Othman Altaieb
Shahzad Ali
Lassaad Ben Ammar
Manel Mrabet
author_sort Karim Gasmi
collection DOAJ
description Detecting hate speech in Arabic social media content is critical for ensuring safe, inclusive, and respectful online communication. However, this task remains challenging due to Arabic’s morphological richness, dialectal variations such as Levantine, and the scarcity of high-quality annotated data. This study proposes a comprehensive and language-aware approach to Arabic hate speech detection that integrates advanced preprocessing, targeted data augmentation, hybrid feature extraction, and deep ensemble learning. Our experiments are conducted on a Levantine Arabic tweet dataset labeled hateful or non-hateful. To address lexical variability and noise common in user-generated content, we apply a dedicated preprocessing pipeline that includes normalization, diacritic removal, and emoji filtering. To further enhance generalization and mitigate data imbalance, we employ two augmentation strategies: synonym replacement using a curated Arabic lexicon and semantic-preserving back-translation through English. We investigate lexical and contextual approaches for feature extraction, including TF-IDF vectors, contextualized AraBERT embeddings, and a hybrid combination of both. These features are input into multiple deep learning classifiers, including CNN-BiGRU, BiLSTM, and DNN architectures. To maximize predictive performance, we develop an ensemble framework that integrates these models. The final prediction is obtained through a weighted fusion of individual model outputs, where the optimal weights are selected using the Grey Wolf Optimizer (GWO), aiming to maximize classification accuracy. Experimental results demonstrate that our proposed hybrid and ensemble-based architecture achieves superior performance, with an accuracy of 83.33% and a ROC-AUC score of 89.5%, outperforming individual models and conventional baselines. These findings highlight the effectiveness of hybrid feature representations and nature-inspired optimization in enhancing Arabic hate speech detection. Our approach offers a scalable, linguistically informed solution for robust content moderation in Arabic digital spaces.
format Article
id doaj-art-393e116c32cf4a00a771c70bb4e23cc3
institution DOAJ
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-393e116c32cf4a00a771c70bb4e23cc32025-08-20T02:46:19ZengIEEEIEEE Access2169-35362025-01-011313141113143110.1109/ACCESS.2025.359167311088089Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic ContentKarim Gasmi0https://orcid.org/0000-0003-0138-2226Ibtihel Ben Ltaifa1https://orcid.org/0000-0001-7796-149XAlameen Eltoum Abdalrahman2https://orcid.org/0000-0001-6325-9069Omer Hamid3https://orcid.org/0009-0006-0369-402XMohamed Othman Altaieb4https://orcid.org/0000-0003-2200-0194Shahzad Ali5https://orcid.org/0000-0002-3787-7098Lassaad Ben Ammar6https://orcid.org/0000-0002-4698-3693Manel Mrabet7https://orcid.org/0009-0007-7638-9939Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi ArabiaSTIH, Sorbonne Université, Paris, FranceDepartment of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi ArabiaCybersecurity Department, College of Engineering and Information Technology, Buraydah Private Colleges, Buraydah, Saudi ArabiaDepartment of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi ArabiaDepartment of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, Saudi ArabiaDepartment of Computer Science, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi ArabiaDepartment of Computer Science, Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi ArabiaDetecting hate speech in Arabic social media content is critical for ensuring safe, inclusive, and respectful online communication. However, this task remains challenging due to Arabic’s morphological richness, dialectal variations such as Levantine, and the scarcity of high-quality annotated data. This study proposes a comprehensive and language-aware approach to Arabic hate speech detection that integrates advanced preprocessing, targeted data augmentation, hybrid feature extraction, and deep ensemble learning. Our experiments are conducted on a Levantine Arabic tweet dataset labeled hateful or non-hateful. To address lexical variability and noise common in user-generated content, we apply a dedicated preprocessing pipeline that includes normalization, diacritic removal, and emoji filtering. To further enhance generalization and mitigate data imbalance, we employ two augmentation strategies: synonym replacement using a curated Arabic lexicon and semantic-preserving back-translation through English. We investigate lexical and contextual approaches for feature extraction, including TF-IDF vectors, contextualized AraBERT embeddings, and a hybrid combination of both. These features are input into multiple deep learning classifiers, including CNN-BiGRU, BiLSTM, and DNN architectures. To maximize predictive performance, we develop an ensemble framework that integrates these models. The final prediction is obtained through a weighted fusion of individual model outputs, where the optimal weights are selected using the Grey Wolf Optimizer (GWO), aiming to maximize classification accuracy. Experimental results demonstrate that our proposed hybrid and ensemble-based architecture achieves superior performance, with an accuracy of 83.33% and a ROC-AUC score of 89.5%, outperforming individual models and conventional baselines. These findings highlight the effectiveness of hybrid feature representations and nature-inspired optimization in enhancing Arabic hate speech detection. Our approach offers a scalable, linguistically informed solution for robust content moderation in Arabic digital spaces.https://ieeexplore.ieee.org/document/11088089/Ensemble learninggrey wolf algorithmhateful content detectionweight selection
spellingShingle Karim Gasmi
Ibtihel Ben Ltaifa
Alameen Eltoum Abdalrahman
Omer Hamid
Mohamed Othman Altaieb
Shahzad Ali
Lassaad Ben Ammar
Manel Mrabet
Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
IEEE Access
Ensemble learning
grey wolf algorithm
hateful content detection
weight selection
title Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
title_full Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
title_fullStr Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
title_full_unstemmed Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
title_short Hybrid Feature and Optimized Deep Learning Model Fusion for Detecting Hateful Arabic Content
title_sort hybrid feature and optimized deep learning model fusion for detecting hateful arabic content
topic Ensemble learning
grey wolf algorithm
hateful content detection
weight selection
url https://ieeexplore.ieee.org/document/11088089/
work_keys_str_mv AT karimgasmi hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT ibtihelbenltaifa hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT alameeneltoumabdalrahman hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT omerhamid hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT mohamedothmanaltaieb hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT shahzadali hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT lassaadbenammar hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent
AT manelmrabet hybridfeatureandoptimizeddeeplearningmodelfusionfordetectinghatefularabiccontent