Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...

Full description

Saved in:
Bibliographic Details
Main Authors: Arnaud Ferré, Mouhamadou Ba, Robert Bossy
Format: Article
Language:English
Published: BioMed Central 2019-06-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572466960007168
author Arnaud Ferré
Mouhamadou Ba
Robert Bossy
author_facet Arnaud Ferré
Mouhamadou Ba
Robert Bossy
author_sort Arnaud Ferré
collection DOAJ
description Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
format Article
id doaj-art-2dcffa9df61f4c95b39887f3a76ee59d
institution Kabale University
issn 2234-0742
language English
publishDate 2019-06-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-2dcffa9df61f4c95b39887f3a76ee59d2025-02-02T09:48:35ZengBioMed CentralGenomics & Informatics2234-07422019-06-0117210.5808/GI.2019.17.2.e20562Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training dataArnaud Ferré0Mouhamadou Ba1Robert Bossy2 MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, FranceEntity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdfbiomedical text miningentity normalizationontologyword embedding
spellingShingle Arnaud Ferré
Mouhamadou Ba
Robert Bossy
Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
Genomics & Informatics
biomedical text mining
entity normalization
ontology
word embedding
title Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_fullStr Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full_unstemmed Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_short Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_sort improving the contes method for normalizing biomedical text entities with concepts from an ontology with almost no training data
topic biomedical text mining
entity normalization
ontology
word embedding
url http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf
work_keys_str_mv AT arnaudferre improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata
AT mouhamadouba improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata
AT robertbossy improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata