Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation
Abstract Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challe...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-02-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-024-04196-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572030684233728 |
---|---|
author | Nan Yin Junheng Liang Xi Guo Xue Jiang Jie He Xiaotong Zhang |
author_facet | Nan Yin Junheng Liang Xi Guo Xue Jiang Jie He Xiaotong Zhang |
author_sort | Nan Yin |
collection | DOAJ |
description | Abstract Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines—TF-IDF and BM25 for schemas matching—in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%. |
format | Article |
id | doaj-art-c6f9c16615b040c9a69eafd0e832bb07 |
institution | Kabale University |
issn | 2052-4463 |
language | English |
publishDate | 2025-02-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj-art-c6f9c16615b040c9a69eafd0e832bb072025-02-02T12:08:20ZengNature PortfolioScientific Data2052-44632025-02-0112111410.1038/s41597-024-04196-xSemi-automatic construction of heterogeneous data schema based on structure and context-aware recommendationNan Yin0Junheng Liang1Xi Guo2Xue Jiang3Jie He4Xiaotong Zhang5School of Computer and Communication Engineering, University of Science and Technology BeijingSchool of Computer and Communication Engineering, University of Science and Technology BeijingSchool of Computer and Communication Engineering, University of Science and Technology BeijingBeijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology BeijingSchool of Computer and Communication Engineering, University of Science and Technology BeijingSchool of Computer and Communication Engineering, University of Science and Technology BeijingAbstract Customizing the structure and format of scientific data facilitates the publication of diverse and heterogeneous data. Many data publishing platforms empower users to create self-designed schemas, leading to schema proliferation and more intricate creation processes. To address these challenges, we present a semi-automatic method and system for constructing heterogeneous material data schemas based on structure and context-aware recommendation. We propose a schema fragment tree structure to represent data schemas with hierarchical relationships, transforming the recommendation into subtree matching. Fragment index and semantic search techniques are introduced to identify candidate fragments, and a tree editing distance algorithm calculates similarity scores. Evaluated on the Data Schema Construction System, the algorithm outperforms baselines—TF-IDF and BM25 for schemas matching—in precision, recall, and F1-score. The baseline for reduced workload refers to the effort required to create schemas without recommendation. Our recommendation improves schema creation efficiency by 50.5% and reduces schema proliferation by 16.5%.https://doi.org/10.1038/s41597-024-04196-x |
spellingShingle | Nan Yin Junheng Liang Xi Guo Xue Jiang Jie He Xiaotong Zhang Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation Scientific Data |
title | Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation |
title_full | Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation |
title_fullStr | Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation |
title_full_unstemmed | Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation |
title_short | Semi-automatic construction of heterogeneous data schema based on structure and context-aware recommendation |
title_sort | semi automatic construction of heterogeneous data schema based on structure and context aware recommendation |
url | https://doi.org/10.1038/s41597-024-04196-x |
work_keys_str_mv | AT nanyin semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation AT junhengliang semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation AT xiguo semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation AT xuejiang semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation AT jiehe semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation AT xiaotongzhang semiautomaticconstructionofheterogeneousdataschemabasedonstructureandcontextawarerecommendation |