Analytical Comparison of Stop Word Recognition Methods in Persian Texts

Stop words are primarily non-significant words used to connect other words in sentence construction. Since these words do not contain specific information about the text, they are typically removed during text processing. Therefore, identifying stop words is an essential operation in text processing...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Samie, Erta Bahmani, Niloofar Mozafari
Format: Article
Language:English
Published: Regional Information Center for Science and Technology (RICeST) 2025-01-01
Series:International Journal of Information Science and Management
Subjects:
Online Access:https://ijism.isc.ac/article_719446_1322bb8af0b283fd4d22f5dd5810090a.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595704094130176
author Mohammad Samie
Erta Bahmani
Niloofar Mozafari
author_facet Mohammad Samie
Erta Bahmani
Niloofar Mozafari
author_sort Mohammad Samie
collection DOAJ
description Stop words are primarily non-significant words used to connect other words in sentence construction. Since these words do not contain specific information about the text, they are typically removed during text processing. Therefore, identifying stop words is an essential operation in text processing. A challenge arises when usually insignificant words can become significant depending on the situation, while words that are typically important can sometimes be classified as stop words. This problem is particularly pronounced in Persian due to the complexities inherent in the language. Recognizing the importance of identifying stop words in Persian, we analyzed and reviewed various approaches, including a dictionary-based approach, POS tagging-based approach, Word2Vec-based approach and FastText-based approach to identify stop words using a corpus of 50.000 Persian sentences from Hamshahri dataset. Our findings indicate that the FastText-based approach outperformed the others with a detection accuracy of 96.98, suggesting that this method can lead to the development of an automatic, reliable, and efficient system.
format Article
id doaj-art-ab753a0640754ce2b1c5d161b1066eed
institution Kabale University
issn 2008-8302
2008-8310
language English
publishDate 2025-01-01
publisher Regional Information Center for Science and Technology (RICeST)
record_format Article
series International Journal of Information Science and Management
spelling doaj-art-ab753a0640754ce2b1c5d161b1066eed2025-01-18T06:18:23ZengRegional Information Center for Science and Technology (RICeST)International Journal of Information Science and Management2008-83022008-83102025-01-012319110710.22034/ijism.2025.2017335.1322719446Analytical Comparison of Stop Word Recognition Methods in Persian TextsMohammad Samie0Erta Bahmani1Niloofar Mozafari2Department of Computer Engineering and IT, Jahrom University, Jahrom, IranDepartment of Computer Engineering and IT, Jahrom University, Jahrom, IranIslamic World Science and Technology Monitoring and Citation Institute (ISC), Shiraz, IranStop words are primarily non-significant words used to connect other words in sentence construction. Since these words do not contain specific information about the text, they are typically removed during text processing. Therefore, identifying stop words is an essential operation in text processing. A challenge arises when usually insignificant words can become significant depending on the situation, while words that are typically important can sometimes be classified as stop words. This problem is particularly pronounced in Persian due to the complexities inherent in the language. Recognizing the importance of identifying stop words in Persian, we analyzed and reviewed various approaches, including a dictionary-based approach, POS tagging-based approach, Word2Vec-based approach and FastText-based approach to identify stop words using a corpus of 50.000 Persian sentences from Hamshahri dataset. Our findings indicate that the FastText-based approach outperformed the others with a detection accuracy of 96.98, suggesting that this method can lead to the development of an automatic, reliable, and efficient system.https://ijism.isc.ac/article_719446_1322bb8af0b283fd4d22f5dd5810090a.pdfstop wordscontent wordspersian language processingpos taggingword2vecfasttext
spellingShingle Mohammad Samie
Erta Bahmani
Niloofar Mozafari
Analytical Comparison of Stop Word Recognition Methods in Persian Texts
International Journal of Information Science and Management
stop words
content words
persian language processing
pos tagging
word2vec
fasttext
title Analytical Comparison of Stop Word Recognition Methods in Persian Texts
title_full Analytical Comparison of Stop Word Recognition Methods in Persian Texts
title_fullStr Analytical Comparison of Stop Word Recognition Methods in Persian Texts
title_full_unstemmed Analytical Comparison of Stop Word Recognition Methods in Persian Texts
title_short Analytical Comparison of Stop Word Recognition Methods in Persian Texts
title_sort analytical comparison of stop word recognition methods in persian texts
topic stop words
content words
persian language processing
pos tagging
word2vec
fasttext
url https://ijism.isc.ac/article_719446_1322bb8af0b283fd4d22f5dd5810090a.pdf
work_keys_str_mv AT mohammadsamie analyticalcomparisonofstopwordrecognitionmethodsinpersiantexts
AT ertabahmani analyticalcomparisonofstopwordrecognitionmethodsinpersiantexts
AT niloofarmozafari analyticalcomparisonofstopwordrecognitionmethodsinpersiantexts