Application of an Improved TF-IDF Method in Literary Text Classification

Literature is extremely important in the advancement of human civilization. Every day, many literary texts of various genres are produced, dating back to ancient times. An urgent concern for managers in the current literary activity is how to classify and save the expanding mass of literary text dat...

Full description

Saved in:

Bibliographic Details
Main Author:	Lin Xiang
Format:	Article
Language:	English
Published:	Wiley 2022-01-01
Series:	Advances in Multimedia
Online Access:	http://dx.doi.org/10.1155/2022/9285324
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832565784880087040
author	Lin Xiang
author_facet	Lin Xiang
author_sort	Lin Xiang
collection	DOAJ
description	Literature is extremely important in the advancement of human civilization. Every day, many literary texts of various genres are produced, dating back to ancient times. An urgent concern for managers in the current literary activity is how to classify and save the expanding mass of literary text data for easy access by readers. In the realm of text classification, the TF-IDF algorithm is a widely used classification algorithm. However, there are significant issues with utilizing this approach, including a lack of distribution information inside categories, a lack of distribution information between categories, and an inability to adjust to skewed datasets. It is possible to improve classification accuracy by using the TF-IDF algorithm in this paper’s application situation by exploiting the association between feature words and the quantity of texts in which they appear, while ignoring the variation in feature word distribution across categories. With the purpose of classifying the literary texts in this study, this work proposes an improved IDF method for the problem of feature words appearing several times and having diverse meanings in different fields. The meanings of feature words in distinct domains are separated to increase the trust in the TF-IDF algorithm’s output. Using the improved TF-IDF method suggested in this research with the random forest (RF) classifier, the experimental results show that the classifier has a good classification impact, which can meet the actual work needs, based on comparative experiments on feature dimension selection, feature selection algorithm, feature weight algorithm, and classifier. It has a fair amount of historical significance.
format	Article
id	doaj-art-6832a10ab436478c96a56b28d2aa3f5c
institution	Kabale University
issn	1687-5699
language	English
publishDate	2022-01-01
publisher	Wiley
record_format	Article
series	Advances in Multimedia
spelling	doaj-art-6832a10ab436478c96a56b28d2aa3f5c2025-02-03T01:06:38ZengWileyAdvances in Multimedia1687-56992022-01-01202210.1155/2022/9285324Application of an Improved TF-IDF Method in Literary Text ClassificationLin Xiang0Public Basic Course Teaching DepartmentLiterature is extremely important in the advancement of human civilization. Every day, many literary texts of various genres are produced, dating back to ancient times. An urgent concern for managers in the current literary activity is how to classify and save the expanding mass of literary text data for easy access by readers. In the realm of text classification, the TF-IDF algorithm is a widely used classification algorithm. However, there are significant issues with utilizing this approach, including a lack of distribution information inside categories, a lack of distribution information between categories, and an inability to adjust to skewed datasets. It is possible to improve classification accuracy by using the TF-IDF algorithm in this paper’s application situation by exploiting the association between feature words and the quantity of texts in which they appear, while ignoring the variation in feature word distribution across categories. With the purpose of classifying the literary texts in this study, this work proposes an improved IDF method for the problem of feature words appearing several times and having diverse meanings in different fields. The meanings of feature words in distinct domains are separated to increase the trust in the TF-IDF algorithm’s output. Using the improved TF-IDF method suggested in this research with the random forest (RF) classifier, the experimental results show that the classifier has a good classification impact, which can meet the actual work needs, based on comparative experiments on feature dimension selection, feature selection algorithm, feature weight algorithm, and classifier. It has a fair amount of historical significance.http://dx.doi.org/10.1155/2022/9285324
spellingShingle	Lin Xiang Application of an Improved TF-IDF Method in Literary Text Classification Advances in Multimedia
title	Application of an Improved TF-IDF Method in Literary Text Classification
title_full	Application of an Improved TF-IDF Method in Literary Text Classification
title_fullStr	Application of an Improved TF-IDF Method in Literary Text Classification
title_full_unstemmed	Application of an Improved TF-IDF Method in Literary Text Classification
title_short	Application of an Improved TF-IDF Method in Literary Text Classification
title_sort	application of an improved tf idf method in literary text classification
url	http://dx.doi.org/10.1155/2022/9285324
work_keys_str_mv	AT linxiang applicationofanimprovedtfidfmethodinliterarytextclassification

Application of an Improved TF-IDF Method in Literary Text Classification

Similar Items