Simple-Random-Sampling-Based Multiclass Text Classification Algorithm
Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web doc...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2014-01-01
|
Series: | The Scientific World Journal |
Online Access: | http://dx.doi.org/10.1155/2014/517498 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832553390543994880 |
---|---|
author | Wuying Liu Lin Wang Mianzhu Yi |
author_facet | Wuying Liu Lin Wang Mianzhu Yi |
author_sort | Wuying Liu |
collection | DOAJ |
description | Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements. |
format | Article |
id | doaj-art-44a9446d8d8c4537b9a659e1f92bb152 |
institution | Kabale University |
issn | 2356-6140 1537-744X |
language | English |
publishDate | 2014-01-01 |
publisher | Wiley |
record_format | Article |
series | The Scientific World Journal |
spelling | doaj-art-44a9446d8d8c4537b9a659e1f92bb1522025-02-03T05:54:02ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/517498517498Simple-Random-Sampling-Based Multiclass Text Classification AlgorithmWuying Liu0Lin Wang1Mianzhu Yi2Department of Language Engineering, PLA University of Foreign Languages, Luoyang, Henan 471003, ChinaCollege of Humanities and Social Sciences, National University of Defense Technology, Changsha, Hunan 410073, ChinaDepartment of Language Engineering, PLA University of Foreign Languages, Luoyang, Henan 471003, ChinaMulticlass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.http://dx.doi.org/10.1155/2014/517498 |
spellingShingle | Wuying Liu Lin Wang Mianzhu Yi Simple-Random-Sampling-Based Multiclass Text Classification Algorithm The Scientific World Journal |
title | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_full | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_fullStr | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_full_unstemmed | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_short | Simple-Random-Sampling-Based Multiclass Text Classification Algorithm |
title_sort | simple random sampling based multiclass text classification algorithm |
url | http://dx.doi.org/10.1155/2014/517498 |
work_keys_str_mv | AT wuyingliu simplerandomsamplingbasedmulticlasstextclassificationalgorithm AT linwang simplerandomsamplingbasedmulticlasstextclassificationalgorithm AT mianzhuyi simplerandomsamplingbasedmulticlasstextclassificationalgorithm |