Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media

Online social media has seen a significant increase in usage over the last decade, enabling people to communicate more easily. The vast amount of data generated by these platforms is mostly uncontrolled and unmanageable. This has also provided opportunities for individuals to engage in hate speech a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Fetahi Endrit, Hamiti Mentor, Susuri Arsim, Zenuni Xhemal, Ajdari Jaumin
Format:	Article
Language:	English
Published:	Sciendo 2024-12-01
Series:	SEEU Review
Subjects:	hate speech detection machine learning handcrafted features albanian social media
Online Access:	https://doi.org/10.2478/seeur-2024-0025
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832570286564704256
author	Fetahi Endrit Hamiti Mentor Susuri Arsim Zenuni Xhemal Ajdari Jaumin
author_facet	Fetahi Endrit Hamiti Mentor Susuri Arsim Zenuni Xhemal Ajdari Jaumin
author_sort	Fetahi Endrit
collection	DOAJ
description	Online social media has seen a significant increase in usage over the last decade, enabling people to communicate more easily. The vast amount of data generated by these platforms is mostly uncontrolled and unmanageable. This has also provided opportunities for individuals to engage in hate speech and offensive language on these platforms. To address this issue, this research aims to conduct extensive experiments using machine learning models and handcrafted feature extraction in the low-resource language Albanian. We utilized several machine-learning algorithms, including Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), and Logistic Regression (LR), and extracted a considerable number of handcrafted features. To improve accuracy, we carefully performed feature selection to identify the most relevant features for detecting hate speech in the Albanian language. The results show that LR performed best in terms of accuracy, with an F1 score of 76.77. Using Random Forest feature ranking and SHAP analysis revealed that many comments on Albanian social media exhibit unique characteristics, resulting in a large feature set. This suggests that there is no clear pattern for the machine learning models to accurately flag the comments, indicating that Albanian is linguistically challenging to analyze.
format	Article
id	doaj-art-15aac03c695a43eb8a87459881ae6d37
institution	Kabale University
issn	1857-8462
language	English
publishDate	2024-12-01
publisher	Sciendo
record_format	Article
series	SEEU Review
spelling	doaj-art-15aac03c695a43eb8a87459881ae6d372025-02-02T15:49:09ZengSciendoSEEU Review1857-84622024-12-01192809210.2478/seeur-2024-0025Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social MediaFetahi Endrit0Hamiti Mentor1Susuri Arsim2Zenuni Xhemal3Ajdari Jaumin41Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North Macedonia1Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North Macedonia2Faculty of Computer Sciences Uninversity of Prizren Ukshin Hoti, Prizren, Kosovo1Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North Macedonia1Faculty of Contemporary Sciences and Technologies South East European University, Tetovo, North MacedoniaOnline social media has seen a significant increase in usage over the last decade, enabling people to communicate more easily. The vast amount of data generated by these platforms is mostly uncontrolled and unmanageable. This has also provided opportunities for individuals to engage in hate speech and offensive language on these platforms. To address this issue, this research aims to conduct extensive experiments using machine learning models and handcrafted feature extraction in the low-resource language Albanian. We utilized several machine-learning algorithms, including Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), and Logistic Regression (LR), and extracted a considerable number of handcrafted features. To improve accuracy, we carefully performed feature selection to identify the most relevant features for detecting hate speech in the Albanian language. The results show that LR performed best in terms of accuracy, with an F1 score of 76.77. Using Random Forest feature ranking and SHAP analysis revealed that many comments on Albanian social media exhibit unique characteristics, resulting in a large feature set. This suggests that there is no clear pattern for the machine learning models to accurately flag the comments, indicating that Albanian is linguistically challenging to analyze.https://doi.org/10.2478/seeur-2024-0025hate speech detectionmachine learninghandcrafted featuresalbaniansocial media
spellingShingle	Fetahi Endrit Hamiti Mentor Susuri Arsim Zenuni Xhemal Ajdari Jaumin Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media SEEU Review hate speech detection machine learning handcrafted features albanian social media
title	Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media
title_full	Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media
title_fullStr	Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media
title_full_unstemmed	Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media
title_short	Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media
title_sort	integrating handcrafted features with machine learning for hate speech detection in albanian social media
topic	hate speech detection machine learning handcrafted features albanian social media
url	https://doi.org/10.2478/seeur-2024-0025
work_keys_str_mv	AT fetahiendrit integratinghandcraftedfeatureswithmachinelearningforhatespeechdetectioninalbaniansocialmedia AT hamitimentor integratinghandcraftedfeatureswithmachinelearningforhatespeechdetectioninalbaniansocialmedia AT susuriarsim integratinghandcraftedfeatureswithmachinelearningforhatespeechdetectioninalbaniansocialmedia AT zenunixhemal integratinghandcraftedfeatureswithmachinelearningforhatespeechdetectioninalbaniansocialmedia AT ajdarijaumin integratinghandcraftedfeatureswithmachinelearningforhatespeechdetectioninalbaniansocialmedia

Integrating Handcrafted Features with Machine Learning for Hate Speech Detection in Albanian Social Media

Similar Items