USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu

The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nauman Ul Haq, Mohib Ullah, Rafiullah Khan, Arshad Ahmad, Ahmad Almogren, Bashir Hayat, Bushra Shafi
Format:	Article
Language:	English
Published:	Wiley 2020-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2020/6684995
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832566064171450368
author	Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi
author_facet	Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi
author_sort	Nauman Ul Haq
collection	DOAJ
description	The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.
format	Article
id	doaj-art-561904cfaa344657a7269dc5e07a2d49
institution	Kabale University
issn	1076-2787 1099-0526
language	English
publishDate	2020-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-561904cfaa344657a7269dc5e07a2d492025-02-03T01:05:10ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/66849956684995USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted UrduNauman Ul Haq0Mohib Ullah1Rafiullah Khan2Arshad Ahmad3Ahmad Almogren4Bashir Hayat5Bushra Shafi6Intitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanDepartment of IT & Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Mang Khanpur Road, Haripur 22620, PakistanDepartment of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi ArabiaInstitute of Management Sciences, Peshawar 25000, PakistanDepartment of Rural Sociology, The University of Agriculture, Peshawar 25000, PakistanThe use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.http://dx.doi.org/10.1155/2020/6684995
spellingShingle	Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu Complexity
title	USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_full	USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_fullStr	USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_full_unstemmed	USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_short	USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
title_sort	usad an intelligent system for slang and abusive text detection in perso arabic scripted urdu
url	http://dx.doi.org/10.1155/2020/6684995
work_keys_str_mv	AT naumanulhaq usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT mohibullah usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT rafiullahkhan usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT arshadahmad usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT ahmadalmogren usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT bashirhayat usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT bushrashafi usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu

USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu

Similar Items