GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic...

Full description

Saved in:

Bibliographic Details
Main Authors:	So-Yeon Oh, Ji-Hyeon Kim, Seo-Jin Kim, Hee-Jo Nam, Hyun-Seok Park
Format:	Article
Language:	English
Published:	BioMed Central 2018-09-01
Series:	Genomics & Informatics
Subjects:	biomedical text mining corpus linguistics text analytics
Online Access:	http://genominfo.org/upload/pdf/gi-2018-16-3-75.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832574087439843328
author	So-Yeon Oh Ji-Hyeon Kim Seo-Jin Kim Hee-Jo Nam Hyun-Seok Park
author_facet	So-Yeon Oh Ji-Hyeon Kim Seo-Jin Kim Hee-Jo Nam Hyun-Seok Park
author_sort	So-Yeon Oh
collection	DOAJ
description	Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.
format	Article
id	doaj-art-342d4b9b8a7b4fb9a8f0d9cc793e8b89
institution	Kabale University
issn	2234-0742
language	English
publishDate	2018-09-01
publisher	BioMed Central
record_format	Article
series	Genomics & Informatics
spelling	doaj-art-342d4b9b8a7b4fb9a8f0d9cc793e8b892025-02-02T01:01:00ZengBioMed CentralGenomics & Informatics2234-07422018-09-01163757710.5808/GI.2018.16.3.75517GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information ExtractionSo-Yeon Oh0Ji-Hyeon Kim1Seo-Jin Kim2Hee-Jo Nam3Hyun-Seok Park4 Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, Korea Bioinformatics Laboratory, ELTEC College of Engineering, Ewha Womans University, Seoul 03760, KoreaGenomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.http://genominfo.org/upload/pdf/gi-2018-16-3-75.pdfbiomedical text miningcorpus linguisticstext analytics
spellingShingle	So-Yeon Oh Ji-Hyeon Kim Seo-Jin Kim Hee-Jo Nam Hyun-Seok Park GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction Genomics & Informatics biomedical text mining corpus linguistics text analytics
title	GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction
title_full	GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction
title_fullStr	GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction
title_full_unstemmed	GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction
title_short	GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction
title_sort	gni corpus version 1 0 annotated full text corpus of to support biomedical information extraction
topic	biomedical text mining corpus linguistics text analytics
url	http://genominfo.org/upload/pdf/gi-2018-16-3-75.pdf
work_keys_str_mv	AT soyeonoh gnicorpusversion10annotatedfulltextcorpusoftosupportbiomedicalinformationextraction AT jihyeonkim gnicorpusversion10annotatedfulltextcorpusoftosupportbiomedicalinformationextraction AT seojinkim gnicorpusversion10annotatedfulltextcorpusoftosupportbiomedicalinformationextraction AT heejonam gnicorpusversion10annotatedfulltextcorpusoftosupportbiomedicalinformationextraction AT hyunseokpark gnicorpusversion10annotatedfulltextcorpusoftosupportbiomedicalinformationextraction

GNI Corpus Version 1.0: Annotated Full-Text Corpus of to Support Biomedical Information Extraction

Similar Items