Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy...

Full description

Saved in:

Bibliographic Details
Main Authors:	Arnaud Ferré, Mouhamadou Ba, Robert Bossy
Format:	Article
Language:	English
Published:	BioMed Central 2019-06-01
Series:	Genomics & Informatics
Subjects:	biomedical text mining entity normalization ontology word embedding
Online Access:	http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832572466960007168
author	Arnaud Ferré Mouhamadou Ba Robert Bossy
author_facet	Arnaud Ferré Mouhamadou Ba Robert Bossy
author_sort	Arnaud Ferré
collection	DOAJ
description	Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.
format	Article
id	doaj-art-2dcffa9df61f4c95b39887f3a76ee59d
institution	Kabale University
issn	2234-0742
language	English
publishDate	2019-06-01
publisher	BioMed Central
record_format	Article
series	Genomics & Informatics
spelling	doaj-art-2dcffa9df61f4c95b39887f3a76ee59d2025-02-02T09:48:35ZengBioMed CentralGenomics & Informatics2234-07422019-06-0117210.5808/GI.2019.17.2.e20562Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training dataArnaud Ferré0Mouhamadou Ba1Robert Bossy2 MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, France MaIAGE, INRA, Paris-Saclay University, 78350 Jouy-en-Josas, FranceEntity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdfbiomedical text miningentity normalizationontologyword embedding
spellingShingle	Arnaud Ferré Mouhamadou Ba Robert Bossy Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data Genomics & Informatics biomedical text mining entity normalization ontology word embedding
title	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_fullStr	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_full_unstemmed	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_short	Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data
title_sort	improving the contes method for normalizing biomedical text entities with concepts from an ontology with almost no training data
topic	biomedical text mining entity normalization ontology word embedding
url	http://genominfo.org/upload/pdf/gi-2019-17-2-e20.pdf
work_keys_str_mv	AT arnaudferre improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT mouhamadouba improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata AT robertbossy improvingthecontesmethodfornormalizingbiomedicaltextentitieswithconceptsfromanontologywithalmostnotrainingdata

Improving the CONTES method for normalizing biomedical text entities with concepts from an ontology with (almost) no training data

Similar Items