Named Entity Recognition in Bengali

This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The trai...

Full description

Saved in:
Bibliographic Details
Main Authors: Asif Ekbal, Sivaji Bandyopadhyay
Format: Article
Language:English
Published: Linköping University Electronic Press 2010-02-01
Series:Northern European Journal of Language Technology
Subjects:
Online Access:https://nejlt.ep.liu.se/article/view/1650
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590640971513856
author Asif Ekbal
Sivaji Bandyopadhyay
author_facet Asif Ekbal
Sivaji Bandyopadhyay
author_sort Asif Ekbal
collection DOAJ
description This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems.
format Article
id doaj-art-ea07c84e84864ca69e0ae661c715de7f
institution Kabale University
issn 2000-1533
language English
publishDate 2010-02-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-ea07c84e84864ca69e0ae661c715de7f2025-01-23T10:36:34ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332010-02-01110.3384/nejlt.2000-1533.091226Named Entity Recognition in Bengali Asif Ekbal0Sivaji Bandyopadhyay1Department of Computational Linguistics, University of Heidelberg, GermanyDepartment of Computer Science and Engineering, Jadavpur University, India This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems. https://nejlt.ep.liu.se/article/view/1650Named Entity RecognitionMaximum EntropyConditional Random FieldSupport Vector MachineWeighted VotingBengali
spellingShingle Asif Ekbal
Sivaji Bandyopadhyay
Named Entity Recognition in Bengali
Northern European Journal of Language Technology
Named Entity Recognition
Maximum Entropy
Conditional Random Field
Support Vector Machine
Weighted Voting
Bengali
title Named Entity Recognition in Bengali
title_full Named Entity Recognition in Bengali
title_fullStr Named Entity Recognition in Bengali
title_full_unstemmed Named Entity Recognition in Bengali
title_short Named Entity Recognition in Bengali
title_sort named entity recognition in bengali
topic Named Entity Recognition
Maximum Entropy
Conditional Random Field
Support Vector Machine
Weighted Voting
Bengali
url https://nejlt.ep.liu.se/article/view/1650
work_keys_str_mv AT asifekbal namedentityrecognitioninbengali
AT sivajibandyopadhyay namedentityrecognitioninbengali