ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition

The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a w...

Full description

Saved in:
Bibliographic Details
Main Authors: Nada Boudjellal, Huaping Zhang, Asif Khan, Arshad Ahmad, Rashid Naseem, Jianyun Shang, Lin Dai
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/6633213
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832550116752359424
author Nada Boudjellal
Huaping Zhang
Asif Khan
Arshad Ahmad
Rashid Naseem
Jianyun Shang
Lin Dai
author_facet Nada Boudjellal
Huaping Zhang
Asif Khan
Arshad Ahmad
Rashid Naseem
Jianyun Shang
Lin Dai
author_sort Nada Boudjellal
collection DOAJ
description The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.
format Article
id doaj-art-0e21feb03b784535bc9732d9fc69d248
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-0e21feb03b784535bc9732d9fc69d2482025-02-03T06:07:37ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/66332136633213ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity RecognitionNada Boudjellal0Huaping Zhang1Asif Khan2Arshad Ahmad3Rashid Naseem4Jianyun Shang5Lin Dai6School of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Haripur, PakistanDepartment of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Haripur, PakistanSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaThe web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.http://dx.doi.org/10.1155/2021/6633213
spellingShingle Nada Boudjellal
Huaping Zhang
Asif Khan
Arshad Ahmad
Rashid Naseem
Jianyun Shang
Lin Dai
ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
Complexity
title ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
title_full ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
title_fullStr ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
title_full_unstemmed ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
title_short ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
title_sort abioner a bert based model for arabic biomedical named entity recognition
url http://dx.doi.org/10.1155/2021/6633213
work_keys_str_mv AT nadaboudjellal abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT huapingzhang abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT asifkhan abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT arshadahmad abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT rashidnaseem abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT jianyunshang abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition
AT lindai abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition