Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversi...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2025-01-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01753-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571155112787968 |
---|---|
author | Jigui Zhao Yurong Qian Shuxiang Hou Jiayin Chen Kui Wang Min Liu Aizimaiti Xiaokaiti |
author_facet | Jigui Zhao Yurong Qian Shuxiang Hou Jiayin Chen Kui Wang Min Liu Aizimaiti Xiaokaiti |
author_sort | Jigui Zhao |
collection | DOAJ |
description | Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversity. Previous studies on Chinese NER have focused on character and lexical information, neglecting the unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple feature information of Chinese characters as embedding to enhance the semantic representation by introducing pinyin and dictionary information. For Chinese named entity recognition, pinyin information of Chinese characters helps to resolve the polyphonic phenomenon, while dictionary information aids in addressing word segmentation ambiguities. Additionally, we innovatively designed the Pinyin-Lexicon Cross-Attention Mechanism (PLCA), which calculates attention scores between various embeddings. This mechanism deeply integrates character, pinyin, and lexicon embeddings, generating character sequences enriched with semantic information. Finally, BiLSTM-CRF is employed for sequence modeling. Through this design, we can more comprehensively capture semantic features in Chinese text, improving the model’s ability to handle polyphonic characters and word segmentation ambiguities, thereby enhancing the recognition performance of Chinese named entities. We conducted experiments on four standard Chinese NER benchmark datasets, and the results show that our method outperforms most baselines, demonstrating the effectiveness of our proposed model. |
format | Article |
id | doaj-art-34050a5f7ee941a4aa41efb55a7e92f5 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2025-01-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-34050a5f7ee941a4aa41efb55a7e92f52025-02-02T12:49:59ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111111310.1007/s40747-024-01753-0Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attentionJigui Zhao0Yurong Qian1Shuxiang Hou2Jiayin Chen3Kui Wang4Min Liu5Aizimaiti Xiaokaiti6School of Software, Xinjiang UniversitySchool of Software, Xinjiang UniversityKey Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous RegionSchool of Software, Xinjiang UniversityXinjiang Rural Credit UnionSchool of Business, Xinjiang UniversitySchool of Software, Xinjiang UniversityAbstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversity. Previous studies on Chinese NER have focused on character and lexical information, neglecting the unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple feature information of Chinese characters as embedding to enhance the semantic representation by introducing pinyin and dictionary information. For Chinese named entity recognition, pinyin information of Chinese characters helps to resolve the polyphonic phenomenon, while dictionary information aids in addressing word segmentation ambiguities. Additionally, we innovatively designed the Pinyin-Lexicon Cross-Attention Mechanism (PLCA), which calculates attention scores between various embeddings. This mechanism deeply integrates character, pinyin, and lexicon embeddings, generating character sequences enriched with semantic information. Finally, BiLSTM-CRF is employed for sequence modeling. Through this design, we can more comprehensively capture semantic features in Chinese text, improving the model’s ability to handle polyphonic characters and word segmentation ambiguities, thereby enhancing the recognition performance of Chinese named entities. We conducted experiments on four standard Chinese NER benchmark datasets, and the results show that our method outperforms most baselines, demonstrating the effectiveness of our proposed model.https://doi.org/10.1007/s40747-024-01753-0Named entity recognitionChinese named entity recognitionCross-attentionPinyin enhancementLexicon enhancement |
spellingShingle | Jigui Zhao Yurong Qian Shuxiang Hou Jiayin Chen Kui Wang Min Liu Aizimaiti Xiaokaiti Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention Complex & Intelligent Systems Named entity recognition Chinese named entity recognition Cross-attention Pinyin enhancement Lexicon enhancement |
title | Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention |
title_full | Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention |
title_fullStr | Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention |
title_full_unstemmed | Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention |
title_short | Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention |
title_sort | unleashing the power of pinyin promoting chinese named entity recognition with multiple embedding and attention |
topic | Named entity recognition Chinese named entity recognition Cross-attention Pinyin enhancement Lexicon enhancement |
url | https://doi.org/10.1007/s40747-024-01753-0 |
work_keys_str_mv | AT jiguizhao unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT yurongqian unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT shuxianghou unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT jiayinchen unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT kuiwang unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT minliu unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention AT aizimaitixiaokaiti unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention |