Named Entity Recognition in Bengali

This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The trai...

Full description

Saved in:

Bibliographic Details
Main Authors:	Asif Ekbal, Sivaji Bandyopadhyay
Format:	Article
Language:	English
Published:	Linköping University Electronic Press 2010-02-01
Series:	Northern European Journal of Language Technology
Subjects:	Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Weighted Voting Bengali
Online Access:	https://nejlt.ep.liu.se/article/view/1650
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590640971513856
author	Asif Ekbal Sivaji Bandyopadhyay
author_facet	Asif Ekbal Sivaji Bandyopadhyay
author_sort	Asif Ekbal
collection	DOAJ
description	This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems.
format	Article
id	doaj-art-ea07c84e84864ca69e0ae661c715de7f
institution	Kabale University
issn	2000-1533
language	English
publishDate	2010-02-01
publisher	Linköping University Electronic Press
record_format	Article
series	Northern European Journal of Language Technology
spelling	doaj-art-ea07c84e84864ca69e0ae661c715de7f2025-01-23T10:36:34ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332010-02-01110.3384/nejlt.2000-1533.091226Named Entity Recognition in Bengali Asif Ekbal0Sivaji Bandyopadhyay1Department of Computational Linguistics, University of Heidelberg, GermanyDepartment of Computer Science and Engineering, Jadavpur University, India This paper reports about a multi-engine approach for the development of a Named Entity Recognition (NER) system in Bengali by combining the classifiers such as Maximum Entropy (ME), Conditional Random Field (CRF) and Support Vector Machine (SVM) with the help of weighted voting techniques. The training set consists of approximately 272K wordforms, out of which 150K wordforms have been manually annotated with the four major named entity (NE) tags, namely Person name, Location name, Organization name and Miscellaneous name. An appropriate tag conversion routine has been defined in order to convert the 122K wordforms of the IJCNLP-08 NER Shared Task on South and South East Asian Languages (NERSSEAL)1 data into the desired forms. The individual classifiers make use of the different contextual information of the words along with the variety of features that are helpful to predict the various NE classes. Lexical context patterns, generated from an unlabeled corpus of 3 million wordforms in a semi-automatic way, have been used as the features of the classifiers in order to improve their performance. In addition, we propose a number of techniques to post-process the output of each classifier in order to reduce the errors and to improve the performance further. Finally, we use three weighted voting techniques to combine the individual models. Experimental results show the effectiveness of the proposed multi-engine approach with the overall Recall, Precision and F-Score values of 93.98%, 90.63% and 92.28%, respectively, which shows an improvement of 14.92% in F-Score over the best performing baseline SVM based system and an improvement of 18.36% in F-Score over the least performing baseline ME based system. Comparative evaluation results also show that the proposed system outperforms the three other existing Bengali NER systems. https://nejlt.ep.liu.se/article/view/1650Named Entity RecognitionMaximum EntropyConditional Random FieldSupport Vector MachineWeighted VotingBengali
spellingShingle	Asif Ekbal Sivaji Bandyopadhyay Named Entity Recognition in Bengali Northern European Journal of Language Technology Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Weighted Voting Bengali
title	Named Entity Recognition in Bengali
title_full	Named Entity Recognition in Bengali
title_fullStr	Named Entity Recognition in Bengali
title_full_unstemmed	Named Entity Recognition in Bengali
title_short	Named Entity Recognition in Bengali
title_sort	named entity recognition in bengali
topic	Named Entity Recognition Maximum Entropy Conditional Random Field Support Vector Machine Weighted Voting Bengali
url	https://nejlt.ep.liu.se/article/view/1650
work_keys_str_mv	AT asifekbal namedentityrecognitioninbengali AT sivajibandyopadhyay namedentityrecognitioninbengali

Named Entity Recognition in Bengali

Similar Items