Opinion: Strategy of Semi-Automatically Annotating a Full-Text Corpus of
There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text a...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
BioMed Central
2018-12-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gi-2018-16-4-e40.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | There is a communal need for an annotated corpus consisting of the full texts of biomedical journal articles. In response to community needs, a prototype version of the full-text corpus of Genomics & Informatics, called GNI version 1.0, has recently been published, with 499 annotated full-text articles available as a corpus resource. However, GNI needs to be updated, as the texts were shallow-parsed and annotated with several existing parsers. I list issues associated with upgrading annotations and give an opinion on the methodology for developing the next version of the GNI corpus, based on a semi-automatic strategy for more linguistically rich corpus annotation. |
---|---|
ISSN: | 2234-0742 |