Named Entity Recognition in Bengali
This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The trai...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Linköping University Electronic Press
2010-02-01
|
Series: | Northern European Journal of Language Technology |
Subjects: | |
Online Access: | https://nejlt.ep.liu.se/article/view/1650 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590640971513856 |
---|---|
author | Asif Ekbal Sivaji Bandyopadhyay |
author_facet | Asif Ekbal Sivaji Bandyopadhyay |
author_sort | Asif Ekbal |
collection | DOAJ |
description |
This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems.
|
format | Article |
id | doaj-art-ea07c84e84864ca69e0ae661c715de7f |
institution | Kabale University |
issn | 2000-1533 |
language | English |
publishDate | 2010-02-01 |
publisher | Linköping University Electronic Press |
record_format | Article |
series | Northern European Journal of Language Technology |
spelling | doaj-art-ea07c84e84864ca69e0ae661c715de7f2025-01-23T10:36:34ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332010-02-01110.3384/nejlt.2000-1533.091226Named Entity Recognition in Bengali Asif Ekbal0Sivaji Bandyopadhyay1Department of Computational Linguistics, University of Heidelberg, GermanyDepartment of Computer Science and Engineering, Jadavpur University, India This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems. https://nejlt.ep.liu.se/article/view/1650Named Entity RecognitionMaximum EntropyConditional Random FieldSupport Vector MachineWeighted VotingBengali |
spellingShingle | Asif Ekbal Sivaji Bandyopadhyay Named Entity Recognition in Bengali Northern European Journal of Language Technology Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Weighted Voting Bengali |
title | Named Entity Recognition in Bengali |
title_full | Named Entity Recognition in Bengali |
title_fullStr | Named Entity Recognition in Bengali |
title_full_unstemmed | Named Entity Recognition in Bengali |
title_short | Named Entity Recognition in Bengali |
title_sort | named entity recognition in bengali |
topic | Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Weighted Voting Bengali |
url | https://nejlt.ep.liu.se/article/view/1650 |
work_keys_str_mv | AT asifekbal namedentityrecognitioninbengali AT sivajibandyopadhyay namedentityrecognitioninbengali |