Analytical Comparison of Stop Word Recognition Methods in Persian Texts

Stop words are primarily non-significant words used to connect other words in sentence construction. Since these words do not contain specific information about the text, they are typically removed during text processing. Therefore, identifying stop words is an essential operation in text processing...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Samie, Erta Bahmani, Niloofar Mozafari
Format: Article
Language:English
Published: Regional Information Center for Science and Technology (RICeST) 2025-01-01
Series:International Journal of Information Science and Management
Subjects:
Online Access:https://ijism.isc.ac/article_719446_1322bb8af0b283fd4d22f5dd5810090a.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stop words are primarily non-significant words used to connect other words in sentence construction. Since these words do not contain specific information about the text, they are typically removed during text processing. Therefore, identifying stop words is an essential operation in text processing. A challenge arises when usually insignificant words can become significant depending on the situation, while words that are typically important can sometimes be classified as stop words. This problem is particularly pronounced in Persian due to the complexities inherent in the language. Recognizing the importance of identifying stop words in Persian, we analyzed and reviewed various approaches, including a dictionary-based approach, POS tagging-based approach, Word2Vec-based approach and FastText-based approach to identify stop words using a corpus of 50.000 Persian sentences from Hamshahri dataset. Our findings indicate that the FastText-based approach outperformed the others with a detection accuracy of 96.98, suggesting that this method can lead to the development of an automatic, reliable, and efficient system.
ISSN:2008-8302
2008-8310