Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network

With the development of Internet technology, government affairs can be handled online. More and more citizens are using online platforms to report to government departments, which is generating a lot of textual data. Among them, the basic but important problem is to automatically classify the differ...

Full description

Saved in:
Bibliographic Details
Main Authors: Mingxin Li, Kaiqian Yin, Minghao Wang
Format: Article
Language:English
Published: Wiley 2022-01-01
Series:Advances in Multimedia
Online Access:http://dx.doi.org/10.1155/2022/6540696
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832554755183869952
author Mingxin Li
Kaiqian Yin
Minghao Wang
author_facet Mingxin Li
Kaiqian Yin
Minghao Wang
author_sort Mingxin Li
collection DOAJ
description With the development of Internet technology, government affairs can be handled online. More and more citizens are using online platforms to report to government departments, which is generating a lot of textual data. Among them, the basic but important problem is to automatically classify the different categories of messages, so that staff from different departments can process relevant information quickly. However, government messages have problems such as fast update rate, a large amount of information, long texts, and difficulty in capturing key points, which make supervised learning methods unsuitable for processing such texts. To address these problems, we propose a semisupervised text classification method based on a transformer-based pointer generator network named Ptr4BERT, which uses the pointer generator network with BERT(bidirectional encoder representation from transformers) embedding as a preprocessor for feature extraction. In this method, text classification can achieve very good results with a small set of labeled data, by extracting features exclusively from the message text. In order to verify the effect of our proposed model, we performed some experiments. Besides, we designed a crawler program and obtained two datasets from different websites, which are named HNMes and QDMes. Experimental results have shown that the proposed method outperforms the state-of-the-art methods significantly.
format Article
id doaj-art-4c2584ca531e452e811102fb2775ef57
institution Kabale University
issn 1687-5699
language English
publishDate 2022-01-01
publisher Wiley
record_format Article
series Advances in Multimedia
spelling doaj-art-4c2584ca531e452e811102fb2775ef572025-02-03T05:50:38ZengWileyAdvances in Multimedia1687-56992022-01-01202210.1155/2022/6540696Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator NetworkMingxin Li0Kaiqian Yin1Minghao Wang2College of Computer Science and EngineeringCollege of Mathematics and Systems ScienceCollege of Mechanical and Electronic EngineeringWith the development of Internet technology, government affairs can be handled online. More and more citizens are using online platforms to report to government departments, which is generating a lot of textual data. Among them, the basic but important problem is to automatically classify the different categories of messages, so that staff from different departments can process relevant information quickly. However, government messages have problems such as fast update rate, a large amount of information, long texts, and difficulty in capturing key points, which make supervised learning methods unsuitable for processing such texts. To address these problems, we propose a semisupervised text classification method based on a transformer-based pointer generator network named Ptr4BERT, which uses the pointer generator network with BERT(bidirectional encoder representation from transformers) embedding as a preprocessor for feature extraction. In this method, text classification can achieve very good results with a small set of labeled data, by extracting features exclusively from the message text. In order to verify the effect of our proposed model, we performed some experiments. Besides, we designed a crawler program and obtained two datasets from different websites, which are named HNMes and QDMes. Experimental results have shown that the proposed method outperforms the state-of-the-art methods significantly.http://dx.doi.org/10.1155/2022/6540696
spellingShingle Mingxin Li
Kaiqian Yin
Minghao Wang
Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
Advances in Multimedia
title Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
title_full Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
title_fullStr Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
title_full_unstemmed Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
title_short Ptr4BERT: Automatic Semisupervised Chinese Government Message Text Classification Method Based on Transformer-Based Pointer Generator Network
title_sort ptr4bert automatic semisupervised chinese government message text classification method based on transformer based pointer generator network
url http://dx.doi.org/10.1155/2022/6540696
work_keys_str_mv AT mingxinli ptr4bertautomaticsemisupervisedchinesegovernmentmessagetextclassificationmethodbasedontransformerbasedpointergeneratornetwork
AT kaiqianyin ptr4bertautomaticsemisupervisedchinesegovernmentmessagetextclassificationmethodbasedontransformerbasedpointergeneratornetwork
AT minghaowang ptr4bertautomaticsemisupervisedchinesegovernmentmessagetextclassificationmethodbasedontransformerbasedpointergeneratornetwork