Chinese Unknown Word Recognition for PCFG-LA Parsing

This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce th...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiuping Huang, Liangye He, Derek F. Wong, Lidia S. Chao
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/959328
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832568006909100032
author Qiuping Huang
Liangye He
Derek F. Wong
Lidia S. Chao
author_facet Qiuping Huang
Liangye He
Derek F. Wong
Lidia S. Chao
author_sort Qiuping Huang
collection DOAJ
description This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.
format Article
id doaj-art-7ae2eff6421d495d8b55e6babe2d354a
institution Kabale University
issn 2356-6140
1537-744X
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-7ae2eff6421d495d8b55e6babe2d354a2025-02-03T00:59:57ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/959328959328Chinese Unknown Word Recognition for PCFG-LA ParsingQiuping Huang0Liangye He1Derek F. Wong2Lidia S. Chao3NLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauThis paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.http://dx.doi.org/10.1155/2014/959328
spellingShingle Qiuping Huang
Liangye He
Derek F. Wong
Lidia S. Chao
Chinese Unknown Word Recognition for PCFG-LA Parsing
The Scientific World Journal
title Chinese Unknown Word Recognition for PCFG-LA Parsing
title_full Chinese Unknown Word Recognition for PCFG-LA Parsing
title_fullStr Chinese Unknown Word Recognition for PCFG-LA Parsing
title_full_unstemmed Chinese Unknown Word Recognition for PCFG-LA Parsing
title_short Chinese Unknown Word Recognition for PCFG-LA Parsing
title_sort chinese unknown word recognition for pcfg la parsing
url http://dx.doi.org/10.1155/2014/959328
work_keys_str_mv AT qiupinghuang chineseunknownwordrecognitionforpcfglaparsing
AT liangyehe chineseunknownwordrecognitionforpcfglaparsing
AT derekfwong chineseunknownwordrecognitionforpcfglaparsing
AT lidiaschao chineseunknownwordrecognitionforpcfglaparsing