Named Entity Recognition for Nepali: Data Sets and Algorithms

Named Entity Recognition (NER) task involves locating Named Entities (NEs) in free text and classifying them into predefined categories such as Person Name, Location and Organization. Although the NER task has been studied widely in resource-rich languages, it has not been studied thoroughly for Nep...

Full description

Saved in:
Bibliographic Details
Main Authors: Nobal Niraula, Jeevan Chapagain
Format: Article
Language:English
Published: LibraryPress@UF 2022-05-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:
Online Access:https://journals.flvc.org/FLAIRS/article/view/130725
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849736570624540672
author Nobal Niraula
Jeevan Chapagain
author_facet Nobal Niraula
Jeevan Chapagain
author_sort Nobal Niraula
collection DOAJ
description Named Entity Recognition (NER) task involves locating Named Entities (NEs) in free text and classifying them into predefined categories such as Person Name, Location and Organization. Although the NER task has been studied widely in resource-rich languages, it has not been studied thoroughly for Nepali, a resource-poor language. In this paper, we present the systematic study of NER for Nepali language with clear Annotation Guidelines obtaining high inter-annotator agreements. The annotation produces EverestNER, the largest human annotated NER data set for Nepali which has 24,587 entities in total. It has 308,353 tokens corresponding to 15,798 sentences which are annotated into five categories: Person, Location, Organization, Date and Event. We split the EverestNER data set into EverestNER-train and EverestNER-test. These standard data sets, therefore, become the first benchmark data sets for evaluating Nepali NER systems. We release the EverestNER benchmark data sets to facilitate the research in Nepali language at https://github.com/nowalab/everest-ner. We report a comprehensive evaluation of state-of-the-art Neural and Transformer models using these data sets. We also discuss the remaining challenges for discovering NEs for Nepali.
format Article
id doaj-art-e06019f9287c4287b2338cd92c7d8b8a
institution DOAJ
issn 2334-0754
2334-0762
language English
publishDate 2022-05-01
publisher LibraryPress@UF
record_format Article
series Proceedings of the International Florida Artificial Intelligence Research Society Conference
spelling doaj-art-e06019f9287c4287b2338cd92c7d8b8a2025-08-20T03:07:14ZengLibraryPress@UFProceedings of the International Florida Artificial Intelligence Research Society Conference2334-07542334-07622022-05-013510.32473/flairs.v35i.13072566924Named Entity Recognition for Nepali: Data Sets and AlgorithmsNobal Niraula0Jeevan ChapagainNowa LabNamed Entity Recognition (NER) task involves locating Named Entities (NEs) in free text and classifying them into predefined categories such as Person Name, Location and Organization. Although the NER task has been studied widely in resource-rich languages, it has not been studied thoroughly for Nepali, a resource-poor language. In this paper, we present the systematic study of NER for Nepali language with clear Annotation Guidelines obtaining high inter-annotator agreements. The annotation produces EverestNER, the largest human annotated NER data set for Nepali which has 24,587 entities in total. It has 308,353 tokens corresponding to 15,798 sentences which are annotated into five categories: Person, Location, Organization, Date and Event. We split the EverestNER data set into EverestNER-train and EverestNER-test. These standard data sets, therefore, become the first benchmark data sets for evaluating Nepali NER systems. We release the EverestNER benchmark data sets to facilitate the research in Nepali language at https://github.com/nowalab/everest-ner. We report a comprehensive evaluation of state-of-the-art Neural and Transformer models using these data sets. We also discuss the remaining challenges for discovering NEs for Nepali.https://journals.flvc.org/FLAIRS/article/view/130725named entity recognitiondata setnepalilow-resource
spellingShingle Nobal Niraula
Jeevan Chapagain
Named Entity Recognition for Nepali: Data Sets and Algorithms
Proceedings of the International Florida Artificial Intelligence Research Society Conference
named entity recognition
data set
nepali
low-resource
title Named Entity Recognition for Nepali: Data Sets and Algorithms
title_full Named Entity Recognition for Nepali: Data Sets and Algorithms
title_fullStr Named Entity Recognition for Nepali: Data Sets and Algorithms
title_full_unstemmed Named Entity Recognition for Nepali: Data Sets and Algorithms
title_short Named Entity Recognition for Nepali: Data Sets and Algorithms
title_sort named entity recognition for nepali data sets and algorithms
topic named entity recognition
data set
nepali
low-resource
url https://journals.flvc.org/FLAIRS/article/view/130725
work_keys_str_mv AT nobalniraula namedentityrecognitionfornepalidatasetsandalgorithms
AT jeevanchapagain namedentityrecognitionfornepalidatasetsandalgorithms