USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu

The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms...

Full description

Saved in:
Bibliographic Details
Main Authors: Nauman Ul Haq, Mohib Ullah, Rafiullah Khan, Arshad Ahmad, Ahmad Almogren, Bashir Hayat, Bushra Shafi
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/6684995
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832566064171450368
author Nauman Ul Haq
Mohib Ullah
Rafiullah Khan
Arshad Ahmad
Ahmad Almogren
Bashir Hayat
Bushra Shafi
author_facet Nauman Ul Haq
Mohib Ullah
Rafiullah Khan
Arshad Ahmad
Ahmad Almogren
Bashir Hayat
Bushra Shafi
author_sort Nauman Ul Haq
collection DOAJ
description The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.
format Article
id doaj-art-561904cfaa344657a7269dc5e07a2d49
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2020-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-561904cfaa344657a7269dc5e07a2d492025-02-03T01:05:10ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/66849956684995USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted UrduNauman Ul Haq0Mohib Ullah1Rafiullah Khan2Arshad Ahmad3Ahmad Almogren4Bashir Hayat5Bushra Shafi6Intitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanDepartment of IT & Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Mang Khanpur Road, Haripur 22620, PakistanDepartment of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi ArabiaInstitute of Management Sciences, Peshawar 25000, PakistanDepartment of Rural Sociology, The University of Agriculture, Peshawar 25000, PakistanThe use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.http://dx.doi.org/10.1155/2020/6684995
spellingShingle Nauman Ul Haq
Mohib Ullah
Rafiullah Khan
Arshad Ahmad
Ahmad Almogren
Bashir Hayat
Bushra Shafi
USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
Complexity
title USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_full USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_fullStr USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_full_unstemmed USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_short USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_sort usad an intelligent system for slang and abusive text detection in perso arabic scripted urdu
url http://dx.doi.org/10.1155/2020/6684995
work_keys_str_mv AT naumanulhaq usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT mohibullah usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT rafiullahkhan usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT arshadahmad usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT ahmadalmogren usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT bashirhayat usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu
AT bushrashafi usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu