Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention

Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversi...

Full description

Saved in:
Bibliographic Details
Main Authors: Jigui Zhao, Yurong Qian, Shuxiang Hou, Jiayin Chen, Kui Wang, Min Liu, Aizimaiti Xiaokaiti
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01753-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571155112787968
author Jigui Zhao
Yurong Qian
Shuxiang Hou
Jiayin Chen
Kui Wang
Min Liu
Aizimaiti Xiaokaiti
author_facet Jigui Zhao
Yurong Qian
Shuxiang Hou
Jiayin Chen
Kui Wang
Min Liu
Aizimaiti Xiaokaiti
author_sort Jigui Zhao
collection DOAJ
description Abstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversity. Previous studies on Chinese NER have focused on character and lexical information, neglecting the unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple feature information of Chinese characters as embedding to enhance the semantic representation by introducing pinyin and dictionary information. For Chinese named entity recognition, pinyin information of Chinese characters helps to resolve the polyphonic phenomenon, while dictionary information aids in addressing word segmentation ambiguities. Additionally, we innovatively designed the Pinyin-Lexicon Cross-Attention Mechanism (PLCA), which calculates attention scores between various embeddings. This mechanism deeply integrates character, pinyin, and lexicon embeddings, generating character sequences enriched with semantic information. Finally, BiLSTM-CRF is employed for sequence modeling. Through this design, we can more comprehensively capture semantic features in Chinese text, improving the model’s ability to handle polyphonic characters and word segmentation ambiguities, thereby enhancing the recognition performance of Chinese named entities. We conducted experiments on four standard Chinese NER benchmark datasets, and the results show that our method outperforms most baselines, demonstrating the effectiveness of our proposed model.
format Article
id doaj-art-34050a5f7ee941a4aa41efb55a7e92f5
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-34050a5f7ee941a4aa41efb55a7e92f52025-02-02T12:49:59ZengSpringerComplex & Intelligent Systems2199-45362198-60532025-01-0111111310.1007/s40747-024-01753-0Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attentionJigui Zhao0Yurong Qian1Shuxiang Hou2Jiayin Chen3Kui Wang4Min Liu5Aizimaiti Xiaokaiti6School of Software, Xinjiang UniversitySchool of Software, Xinjiang UniversityKey Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous RegionSchool of Software, Xinjiang UniversityXinjiang Rural Credit UnionSchool of Business, Xinjiang UniversitySchool of Software, Xinjiang UniversityAbstract Named Entity Recognition (NER) aims to identify entities with specific meanings and their boundaries in natural language texts. Due to the differences between Chinese and English language families, Chinese NER faces challenges such as ambiguous word boundary delineation and semantic diversity. Previous studies on Chinese NER have focused on character and lexical information, neglecting the unique feature of Chinese—pinyin information. In this paper, we propose CPL-NER, which combines multiple feature information of Chinese characters as embedding to enhance the semantic representation by introducing pinyin and dictionary information. For Chinese named entity recognition, pinyin information of Chinese characters helps to resolve the polyphonic phenomenon, while dictionary information aids in addressing word segmentation ambiguities. Additionally, we innovatively designed the Pinyin-Lexicon Cross-Attention Mechanism (PLCA), which calculates attention scores between various embeddings. This mechanism deeply integrates character, pinyin, and lexicon embeddings, generating character sequences enriched with semantic information. Finally, BiLSTM-CRF is employed for sequence modeling. Through this design, we can more comprehensively capture semantic features in Chinese text, improving the model’s ability to handle polyphonic characters and word segmentation ambiguities, thereby enhancing the recognition performance of Chinese named entities. We conducted experiments on four standard Chinese NER benchmark datasets, and the results show that our method outperforms most baselines, demonstrating the effectiveness of our proposed model.https://doi.org/10.1007/s40747-024-01753-0Named entity recognitionChinese named entity recognitionCross-attentionPinyin enhancementLexicon enhancement
spellingShingle Jigui Zhao
Yurong Qian
Shuxiang Hou
Jiayin Chen
Kui Wang
Min Liu
Aizimaiti Xiaokaiti
Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
Complex & Intelligent Systems
Named entity recognition
Chinese named entity recognition
Cross-attention
Pinyin enhancement
Lexicon enhancement
title Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
title_full Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
title_fullStr Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
title_full_unstemmed Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
title_short Unleashing the power of pinyin: promoting Chinese named entity recognition with multiple embedding and attention
title_sort unleashing the power of pinyin promoting chinese named entity recognition with multiple embedding and attention
topic Named entity recognition
Chinese named entity recognition
Cross-attention
Pinyin enhancement
Lexicon enhancement
url https://doi.org/10.1007/s40747-024-01753-0
work_keys_str_mv AT jiguizhao unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT yurongqian unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT shuxianghou unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT jiayinchen unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT kuiwang unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT minliu unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention
AT aizimaitixiaokaiti unleashingthepowerofpinyinpromotingchinesenamedentityrecognitionwithmultipleembeddingandattention