ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition
The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a w...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-01-01
|
Series: | Complexity |
Online Access: | http://dx.doi.org/10.1155/2021/6633213 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832550116752359424 |
---|---|
author | Nada Boudjellal Huaping Zhang Asif Khan Arshad Ahmad Rashid Naseem Jianyun Shang Lin Dai |
author_facet | Nada Boudjellal Huaping Zhang Asif Khan Arshad Ahmad Rashid Naseem Jianyun Shang Lin Dai |
author_sort | Nada Boudjellal |
collection | DOAJ |
description | The web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score. |
format | Article |
id | doaj-art-0e21feb03b784535bc9732d9fc69d248 |
institution | Kabale University |
issn | 1076-2787 1099-0526 |
language | English |
publishDate | 2021-01-01 |
publisher | Wiley |
record_format | Article |
series | Complexity |
spelling | doaj-art-0e21feb03b784535bc9732d9fc69d2482025-02-03T06:07:37ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/66332136633213ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity RecognitionNada Boudjellal0Huaping Zhang1Asif Khan2Arshad Ahmad3Rashid Naseem4Jianyun Shang5Lin Dai6School of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaDepartment of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Haripur, PakistanDepartment of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences & Technology, Haripur, PakistanSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaSchool of Computer Science and Technology, Beijing Institute of Technology, Beijing, ChinaThe web is being loaded daily with a huge volume of data, mainly unstructured textual data, which increases the need for information extraction and NLP systems significantly. Named-entity recognition task is a key step towards efficiently understanding text data and saving time and effort. Being a widely used language globally, English is taking over most of the research conducted in this field, especially in the biomedical domain. Unlike other languages, Arabic suffers from lack of resources. This work presents a BERT-based model to identify biomedical named entities in the Arabic text data (specifically disease and treatment named entities) that investigates the effectiveness of pretraining a monolingual BERT model with a small-scale biomedical dataset on enhancing the model understanding of Arabic biomedical text. The model performance was compared with two state-of-the-art models (namely, AraBERT and multilingual BERT cased), and it outperformed both models with 85% F1-score.http://dx.doi.org/10.1155/2021/6633213 |
spellingShingle | Nada Boudjellal Huaping Zhang Asif Khan Arshad Ahmad Rashid Naseem Jianyun Shang Lin Dai ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition Complexity |
title | ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition |
title_full | ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition |
title_fullStr | ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition |
title_full_unstemmed | ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition |
title_short | ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition |
title_sort | abioner a bert based model for arabic biomedical named entity recognition |
url | http://dx.doi.org/10.1155/2021/6633213 |
work_keys_str_mv | AT nadaboudjellal abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT huapingzhang abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT asifkhan abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT arshadahmad abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT rashidnaseem abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT jianyunshang abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition AT lindai abionerabertbasedmodelforarabicbiomedicalnamedentityrecognition |