Named entity recognition based on span and category enhancement for Chinese news
In the field of news, the identification of named entities is complicated by complex syntactic structures and long entity names, which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges,...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
POSTS&TELECOM PRESS Co., LTD
2024-12-01
|
Series: | 智能科学与技术学报 |
Subjects: | |
Online Access: | http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202437/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the field of news, the identification of named entities is complicated by complex syntactic structures and long entity names, which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges, a model named SpaCE (span and category enhancement for Chinese news named entity recognition) was proposed. This model was developed based on the bidirectional encoder representation pre-trained model with a Transformer structure (BERT) and was enhanced by span prediction and category description to improve recognition performance. During the encoding of news text information, category descriptions were incorporated to enhance semantic knowledge, and a span-based decoding method was adopted to address interruptions in predicting long entities. Furthermore, word boundary information was introduced through precise labeling, and the entity matching strategy was optimized, effectively reducing non-entity matching caused by span decoding. Compared to baseline models, SpaCE demonstrated improved performance on three datasets. Furthermore, SpaCE exhibits strong named entity recognition capabilities on disordered texts, indicating its robustness. |
---|---|
ISSN: | 2096-6652 |