Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation

Abstract Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challe...

Full description

Saved in:

Bibliographic Details
Main Authors:	Nan Yin, Junheng Liang, Xi Guo, Xue Jiang, Jie He, Xiaotong Zhang
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-02-01
Series:	Scientific Data
Online Access:	https://doi.org/10.1038/s41597-024-04196-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines—TF-IDF and BM25 for schemas matching—in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%.
ISSN:	2052-4463

Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation

Similar Items