USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu
The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2020-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2020/6684995 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832566064171450368 |
---|---|
author | Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi |
author_facet | Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi |
author_sort | Nauman Ul Haq |
collection | DOAJ |
description | The use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models. |
format | Article |
id | doaj-art-561904cfaa344657a7269dc5e07a2d49 |
institution | Kabale University |
issn | 1076-2787 1099-0526 |
language | English |
publishDate | 2020-01-01 |
publisher | Wiley |
record_format | Article |
series | Complexity |
spelling | doaj-art-561904cfaa344657a7269dc5e07a2d492025-02-03T01:05:10ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/66849956684995USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted UrduNauman Ul Haq0Mohib Ullah1Rafiullah Khan2Arshad Ahmad3Ahmad Almogren4Bashir Hayat5Bushra Shafi6Intitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanIntitute of Computer Science and Information Technology, The University of Agriculture, Peshawar 25000, PakistanDepartment of IT & Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Mang Khanpur Road, Haripur 22620, PakistanDepartment of Computer Science, College of Computer and Information Sciences, King Saud University, Riyadh 11633, Saudi ArabiaInstitute of Management Sciences, Peshawar 25000, PakistanDepartment of Rural Sociology, The University of Agriculture, Peshawar 25000, PakistanThe use of slang, abusive, and offensive language has become common practice on social media. Even though social media companies have censorship polices for slang, abusive, vulgar, and offensive language, due to limited resources and research in the automatic detection of abusive language mechanisms other than English, this condemnable act is still practiced. This study proposes USAD (Urdu Slang and Abusive words Detection), a lexicon-based intelligent framework to detect abusive and slang words in Perso-Arabic-scripted Urdu Tweets. Furthermore, due to the nonavailability of the standard dataset, we also design and annotate a dataset of abusive, offensive, and slang word Perso-Arabic-scripted Urdu as our second significant contribution for future research. The results show that our proposed USAD model can identify 72.6% correctly as abusive or nonabusive Tweet. Additionally, we have also identified some key factors that can help the researchers improve their abusive language detection models.http://dx.doi.org/10.1155/2020/6684995 |
spellingShingle | Nauman Ul Haq Mohib Ullah Rafiullah Khan Arshad Ahmad Ahmad Almogren Bashir Hayat Bushra Shafi USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu Complexity |
title | USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu |
title_full | USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu |
title_fullStr | USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu |
title_full_unstemmed | USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu |
title_short | USAD: An Intelligent System for Slang and Abusive Text Detection in PERSO-Arabic-Scripted Urdu |
title_sort | usad an intelligent system for slang and abusive text detection in perso arabic scripted urdu |
url | http://dx.doi.org/10.1155/2020/6684995 |
work_keys_str_mv | AT naumanulhaq usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT mohibullah usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT rafiullahkhan usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT arshadahmad usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT ahmadalmogren usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT bashirhayat usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu AT bushrashafi usadanintelligentsystemforslangandabusivetextdetectioninpersoarabicscriptedurdu |