Self-Supervised Chinese Ontology Learning from Online Encyclopedias

Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for o...

Full description

Saved in:
Bibliographic Details
Main Authors: Fanghuai Hu, Zhiqing Shao, Tong Ruan
Format: Article
Language:English
Published: Wiley 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/848631
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547608580587520
author Fanghuai Hu
Zhiqing Shao
Tong Ruan
author_facet Fanghuai Hu
Zhiqing Shao
Tong Ruan
author_sort Fanghuai Hu
collection DOAJ
description Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
format Article
id doaj-art-e4c8dcabd71a47818c3cd8b1fd3b1673
institution Kabale University
issn 2356-6140
1537-744X
language English
publishDate 2014-01-01
publisher Wiley
record_format Article
series The Scientific World Journal
spelling doaj-art-e4c8dcabd71a47818c3cd8b1fd3b16732025-02-03T06:44:18ZengWileyThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/848631848631Self-Supervised Chinese Ontology Learning from Online EncyclopediasFanghuai Hu0Zhiqing Shao1Tong Ruan2Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaDepartment of Computer Science and Engineering, East China University of Science and Technology, Shanghai 200237, ChinaConstructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.http://dx.doi.org/10.1155/2014/848631
spellingShingle Fanghuai Hu
Zhiqing Shao
Tong Ruan
Self-Supervised Chinese Ontology Learning from Online Encyclopedias
The Scientific World Journal
title Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_full Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_fullStr Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_full_unstemmed Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_short Self-Supervised Chinese Ontology Learning from Online Encyclopedias
title_sort self supervised chinese ontology learning from online encyclopedias
url http://dx.doi.org/10.1155/2014/848631
work_keys_str_mv AT fanghuaihu selfsupervisedchineseontologylearningfromonlineencyclopedias
AT zhiqingshao selfsupervisedchineseontologylearningfromonlineencyclopedias
AT tongruan selfsupervisedchineseontologylearningfromonlineencyclopedias