Simple-Random-Sampling-Based Multiclass Text Classification Algorithm

Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web doc...

Full description

Saved in:
Bibliographic Details
Main Authors: Wuying Liu, Lin Wang, Mianzhu Yi
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/517498
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832553390543994880
author Wuying Liu
Lin Wang
Mianzhu Yi
author_facet Wuying Liu
Lin Wang
Mianzhu Yi
author_sort Wuying Liu
collection DOAJ
description Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.
format Article
id doaj-art-44a9446d8d8c4537b9a659e1f92bb152
institution Kabale University
issn 2356-6140
1537-744X
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-44a9446d8d8c4537b9a659e1f92bb1522025-02-03T05:54:02ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/517498517498Simple-Random-Sampling-Based Multiclass Text Classification AlgorithmWuying Liu0Lin Wang1Mianzhu Yi2Department of Language Engineering, PLA University of Foreign Languages, Luoyang, Henan 471003, ChinaCollege of Humanities and Social Sciences, National University of Defense Technology, Changsha, Hunan 410073, ChinaDepartment of Language Engineering, PLA University of Foreign Languages, Luoyang, Henan 471003, ChinaMulticlass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.http://dx.doi.org/10.1155/2014/517498
spellingShingle Wuying Liu
Lin Wang
Mianzhu Yi
Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
The Scientific World Journal
title Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_full Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_fullStr Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_full_unstemmed Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_short Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
title_sort simple random sampling based multiclass text classification algorithm
url http://dx.doi.org/10.1155/2014/517498
work_keys_str_mv AT wuyingliu simplerandomsamplingbasedmulticlasstextclassificationalgorithm
AT linwang simplerandomsamplingbasedmulticlasstextclassificationalgorithm
AT mianzhuyi simplerandomsamplingbasedmulticlasstextclassificationalgorithm