Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF

Most of the hate speech and abusive content on social media, particularly in the Indonesian language, presents significant challenges for content moderation systems. Previous research has applied machine learning models such as Recurrent Neural Networks (RNN), Support Vector Machines (SVM), and Conv...

Full description

Saved in:
Bibliographic Details
Main Authors: Syaikha Amirah Zikrina, Fitriyani
Format: Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 2025-02-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:https://jurnal.iaii.or.id/index.php/RESTI/article/view/6179
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850222615321378816
author Syaikha Amirah Zikrina
Fitriyani
author_facet Syaikha Amirah Zikrina
Fitriyani
author_sort Syaikha Amirah Zikrina
collection DOAJ
description Most of the hate speech and abusive content on social media, particularly in the Indonesian language, presents significant challenges for content moderation systems. Previous research has applied machine learning models such as Recurrent Neural Networks (RNN), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) to address this issue. However, these approaches are limited in their ability to capture the relational and contextual nuances inherent in the data, resulting in suboptimal performance. This study introduces an approach by combining Graph Neural Networks (GNN) with Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction to improve hate speech detection on Twitter (platform X). The dataset consists of 13,169 Indonesian tweets, manually labeled for hate speech and abusive categories. Preprocessing steps include text cleaning, stemming, stop-word removal, and normalization. The GNN model achieved superior results, with accuracy scores of 92.90% for Abusive and 89.78% for Hate Speech, significantly outperforming the RNN model, which achieved accuracy of 86.09% and 86.15%, respectively. This study highlights the advantage of graph-based approaches in capturing complex relationships within text data. Future research can explore expanding datasets to include regional dialects and integrating advanced feature extraction techniques like Word2Vec or BERT. This study establishes a robust framework for improving hate speech detection, offering a valuable contribution to safer digital environments.
format Article
id doaj-art-8f9f08ec16e641c3b85257b8f054581a
institution OA Journals
issn 2580-0760
language English
publishDate 2025-02-01
publisher Ikatan Ahli Informatika Indonesia
record_format Article
series Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
spelling doaj-art-8f9f08ec16e641c3b85257b8f054581a2025-08-20T02:06:16ZengIkatan Ahli Informatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602025-02-019113714510.29207/resti.v9i1.61796179Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDFSyaikha Amirah Zikrina0Fitriyani1Telkom UniversityTelkom UniversityMost of the hate speech and abusive content on social media, particularly in the Indonesian language, presents significant challenges for content moderation systems. Previous research has applied machine learning models such as Recurrent Neural Networks (RNN), Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) to address this issue. However, these approaches are limited in their ability to capture the relational and contextual nuances inherent in the data, resulting in suboptimal performance. This study introduces an approach by combining Graph Neural Networks (GNN) with Term Frequency-Inverse Document Frequency (TF-IDF) for feature extraction to improve hate speech detection on Twitter (platform X). The dataset consists of 13,169 Indonesian tweets, manually labeled for hate speech and abusive categories. Preprocessing steps include text cleaning, stemming, stop-word removal, and normalization. The GNN model achieved superior results, with accuracy scores of 92.90% for Abusive and 89.78% for Hate Speech, significantly outperforming the RNN model, which achieved accuracy of 86.09% and 86.15%, respectively. This study highlights the advantage of graph-based approaches in capturing complex relationships within text data. Future research can explore expanding datasets to include regional dialects and integrating advanced feature extraction techniques like Word2Vec or BERT. This study establishes a robust framework for improving hate speech detection, offering a valuable contribution to safer digital environments.https://jurnal.iaii.or.id/index.php/RESTI/article/view/6179context-aware sentiment analysishate speech detectiongraph neural network (gnn)social media xtf-idf
spellingShingle Syaikha Amirah Zikrina
Fitriyani
Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
context-aware sentiment analysis
hate speech detection
graph neural network (gnn)
social media x
tf-idf
title Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
title_full Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
title_fullStr Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
title_full_unstemmed Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
title_short Advancing Hate Speech Detection in Indonesian Language Using Graph Neural Networks and TF-IDF
title_sort advancing hate speech detection in indonesian language using graph neural networks and tf idf
topic context-aware sentiment analysis
hate speech detection
graph neural network (gnn)
social media x
tf-idf
url https://jurnal.iaii.or.id/index.php/RESTI/article/view/6179
work_keys_str_mv AT syaikhaamirahzikrina advancinghatespeechdetectioninindonesianlanguageusinggraphneuralnetworksandtfidf
AT fitriyani advancinghatespeechdetectioninindonesianlanguageusinggraphneuralnetworksandtfidf