Named entity recognition based on span and category enhancement for Chinese news

In the field of news, the identification of named entities is complicated by complex syntactic structures and long entity names, which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges,...

Full description

Saved in:
Bibliographic Details
Main Authors: QI Ruiyan, LI Longjie, XU Shicheng, MA Ligong, MA Zhixin
Format: Article
Language:zho
Published: POSTS&TELECOM PRESS Co., LTD 2024-12-01
Series:智能科学与技术学报
Subjects:
Online Access:http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202437/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586318508457984
author QI Ruiyan
LI Longjie
XU Shicheng
MA Ligong
MA Zhixin
author_facet QI Ruiyan
LI Longjie
XU Shicheng
MA Ligong
MA Zhixin
author_sort QI Ruiyan
collection DOAJ
description In the field of news, the identification of named entities is complicated by complex syntactic structures and long entity names, which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges, a model named SpaCE (span and category enhancement for Chinese news named entity recognition) was proposed. This model was developed based on the bidirectional encoder representation pre-trained model with a Transformer structure (BERT) and was enhanced by span prediction and category description to improve recognition performance. During the encoding of news text information, category descriptions were incorporated to enhance semantic knowledge, and a span-based decoding method was adopted to address interruptions in predicting long entities. Furthermore, word boundary information was introduced through precise labeling, and the entity matching strategy was optimized, effectively reducing non-entity matching caused by span decoding. Compared to baseline models, SpaCE demonstrated improved performance on three datasets. Furthermore, SpaCE exhibits strong named entity recognition capabilities on disordered texts, indicating its robustness.
format Article
id doaj-art-1f8a9b4e8069496794cca843e5e82fbe
institution Kabale University
issn 2096-6652
language zho
publishDate 2024-12-01
publisher POSTS&TELECOM PRESS Co., LTD
record_format Article
series 智能科学与技术学报
spelling doaj-art-1f8a9b4e8069496794cca843e5e82fbe2025-01-25T19:00:17ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522024-12-01649550876439053Named entity recognition based on span and category enhancement for Chinese newsQI RuiyanLI LongjieXU ShichengMA LigongMA ZhixinIn the field of news, the identification of named entities is complicated by complex syntactic structures and long entity names, which pose challenges for determining entity boundaries and lead to interruptions in predicting long entities using sequence labeling methods. To address these challenges, a model named SpaCE (span and category enhancement for Chinese news named entity recognition) was proposed. This model was developed based on the bidirectional encoder representation pre-trained model with a Transformer structure (BERT) and was enhanced by span prediction and category description to improve recognition performance. During the encoding of news text information, category descriptions were incorporated to enhance semantic knowledge, and a span-based decoding method was adopted to address interruptions in predicting long entities. Furthermore, word boundary information was introduced through precise labeling, and the entity matching strategy was optimized, effectively reducing non-entity matching caused by span decoding. Compared to baseline models, SpaCE demonstrated improved performance on three datasets. Furthermore, SpaCE exhibits strong named entity recognition capabilities on disordered texts, indicating its robustness.http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202437/News named entity recognitionBERTSpanCategory enhancementWord boundary information
spellingShingle QI Ruiyan
LI Longjie
XU Shicheng
MA Ligong
MA Zhixin
Named entity recognition based on span and category enhancement for Chinese news
智能科学与技术学报
News named entity recognition
BERT
Span
Category enhancement
Word boundary information
title Named entity recognition based on span and category enhancement for Chinese news
title_full Named entity recognition based on span and category enhancement for Chinese news
title_fullStr Named entity recognition based on span and category enhancement for Chinese news
title_full_unstemmed Named entity recognition based on span and category enhancement for Chinese news
title_short Named entity recognition based on span and category enhancement for Chinese news
title_sort named entity recognition based on span and category enhancement for chinese news
topic News named entity recognition
BERT
Span
Category enhancement
Word boundary information
url http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202437/
work_keys_str_mv AT qiruiyan namedentityrecognitionbasedonspanandcategoryenhancementforchinesenews
AT lilongjie namedentityrecognitionbasedonspanandcategoryenhancementforchinesenews
AT xushicheng namedentityrecognitionbasedonspanandcategoryenhancementforchinesenews
AT maligong namedentityrecognitionbasedonspanandcategoryenhancementforchinesenews
AT mazhixin namedentityrecognitionbasedonspanandcategoryenhancementforchinesenews