Chinese Unknown Word Recognition for PCFG-LA Parsing
This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce th...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2014-01-01
|
Series: | The Scientific World Journal |
Online Access: | http://dx.doi.org/10.1155/2014/959328 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832568006909100032 |
---|---|
author | Qiuping Huang Liangye He Derek F. Wong Lidia S. Chao |
author_facet | Qiuping Huang Liangye He Derek F. Wong Lidia S. Chao |
author_sort | Qiuping Huang |
collection | DOAJ |
description | This paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness. |
format | Article |
id | doaj-art-7ae2eff6421d495d8b55e6babe2d354a |
institution | Kabale University |
issn | 2356-6140 1537-744X |
language | English |
publishDate | 2014-01-01 |
publisher | Wiley |
record_format | Article |
series | The Scientific World Journal |
spelling | doaj-art-7ae2eff6421d495d8b55e6babe2d354a2025-02-03T00:59:57ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/959328959328Chinese Unknown Word Recognition for PCFG-LA ParsingQiuping Huang0Liangye He1Derek F. Wong2Lidia S. Chao3NLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauNLP2CT Laboratory, Department of Computer and Information Science, University of Macau, MacauThis paper investigates the recognition of unknown words in Chinese parsing. Two methods are proposed to handle this problem. One is the modification of a character-based model. We model the emission probability of an unknown word using the first and last characters in the word. It aims to reduce the POS tag ambiguities of unknown words to improve the parsing performance. In addition, a novel method, using graph-based semisupervised learning (SSL), is proposed to improve the syntax parsing of unknown words. Its goal is to discover additional lexical knowledge from a large amount of unlabeled data to help the syntax parsing. The method is mainly to propagate lexical emission probabilities to unknown words by building the similarity graphs over the words of labeled and unlabeled data. The derived distributions are incorporated into the parsing process. The proposed methods are effective in dealing with the unknown words to improve the parsing. Empirical results for Penn Chinese Treebank and TCT Treebank revealed its effectiveness.http://dx.doi.org/10.1155/2014/959328 |
spellingShingle | Qiuping Huang Liangye He Derek F. Wong Lidia S. Chao Chinese Unknown Word Recognition for PCFG-LA Parsing The Scientific World Journal |
title | Chinese Unknown Word Recognition for PCFG-LA Parsing |
title_full | Chinese Unknown Word Recognition for PCFG-LA Parsing |
title_fullStr | Chinese Unknown Word Recognition for PCFG-LA Parsing |
title_full_unstemmed | Chinese Unknown Word Recognition for PCFG-LA Parsing |
title_short | Chinese Unknown Word Recognition for PCFG-LA Parsing |
title_sort | chinese unknown word recognition for pcfg la parsing |
url | http://dx.doi.org/10.1155/2014/959328 |
work_keys_str_mv | AT qiupinghuang chineseunknownwordrecognitionforpcfglaparsing AT liangyehe chineseunknownwordrecognitionforpcfglaparsing AT derekfwong chineseunknownwordrecognitionforpcfglaparsing AT lidiaschao chineseunknownwordrecognitionforpcfglaparsing |