Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversi...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01753-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversity. Previous studies on Chinese NER have focused on character and lexical information, neglecting the unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple feature information of Chinese characters as embedding to enhance the semantic representation by introducing pinyin and dictionary information. For Chinese named entity recognition, pinyin information of Chinese characters helps to resolve the polyphonic phenomenon, while dictionary information aids in addressing word segmentation ambiguities. Additionally, we innovatively designed the Pinyin-Lexicon Cross-Attention Mechanism (PLCA), which calculates attention scores between various embeddings. This mechanism deeply integrates character, pinyin, and lexicon embeddings, generating character sequences enriched with semantic information. Finally, BiLSTM-CRF is employed for sequence modeling. Through this design, we can more comprehensively capture semantic features in Chinese text, improving the model’s ability to handle polyphonic characters and word segmentation ambiguities, thereby enhancing the recognition performance of Chinese named entities. We conducted experiments on four standard Chinese NER benchmark datasets, and the results show that our method outperforms most baselines, demonstrating the effectiveness of our proposed model. |
---|---|
ISSN: | 2199-4536 2198-6053 |