Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar co...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Tsinghua University Press
2018-09-01
|
Series: | Big Data Mining and Analytics |
Subjects: | |
Online Access: | https://www.sciopen.com/article/10.26599/BDMA.2018.9020018 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572923107344384 |
---|---|
author | Ning Yu Zhihua Li Zeng Yu |
author_facet | Ning Yu Zhihua Li Zeng Yu |
author_sort | Ning Yu |
collection | DOAJ |
description | Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar conversion occurs in Genomic Signal Processing (GSP), where genome sequences are transformed into numerical sequences for signal extraction and recognition. This kind of conversion is also called encoding scheme. The diverse encoding schemes can greatly affect the performance of GSP applications and machine learning models. This paper aims to collect, analyze, discuss, and summarize the existing encoding schemes of genome sequence particularly in GSP as well as other genome analysis applications to provide a comprehensive reference for the genomic data representation and feature learning in machine learning. |
format | Article |
id | doaj-art-0da5127f96ed4180b5d5ee246faff785 |
institution | Kabale University |
issn | 2096-0654 |
language | English |
publishDate | 2018-09-01 |
publisher | Tsinghua University Press |
record_format | Article |
series | Big Data Mining and Analytics |
spelling | doaj-art-0da5127f96ed4180b5d5ee246faff7852025-02-02T06:00:35ZengTsinghua University PressBig Data Mining and Analytics2096-06542018-09-011319121010.26599/BDMA.2018.9020018Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine LearningNing Yu0Zhihua Li1Zeng Yu2<institution content-type="dept">Department of Computing Sciences, College at Brockport</institution>, <institution>State University of New York</institution>, <city>Brockport</city>, <state>NY</state> <postal-code>14422</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Technology</institution> at <institution>Jiangnan University</institution>, <city>Wuxi</city> <postal-code>214122</postal-code>, <country>China</country>.<institution content-type="dept">School of Information Science and Technology</institution>, <institution>Southwest Jiaotong University</institution>, <city>Chengdu</city> <postal-code>611756</postal-code>, <country>China</country>.Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar conversion occurs in Genomic Signal Processing (GSP), where genome sequences are transformed into numerical sequences for signal extraction and recognition. This kind of conversion is also called encoding scheme. The diverse encoding schemes can greatly affect the performance of GSP applications and machine learning models. This paper aims to collect, analyze, discuss, and summarize the existing encoding schemes of genome sequence particularly in GSP as well as other genome analysis applications to provide a comprehensive reference for the genomic data representation and feature learning in machine learning.https://www.sciopen.com/article/10.26599/BDMA.2018.9020018encoding schemedata representationfeature learningdeep learninggenomic signal processingmachine learninggenome analysis |
spellingShingle | Ning Yu Zhihua Li Zeng Yu Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning Big Data Mining and Analytics encoding scheme data representation feature learning deep learning genomic signal processing machine learning genome analysis |
title | Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning |
title_full | Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning |
title_fullStr | Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning |
title_full_unstemmed | Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning |
title_short | Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning |
title_sort | survey on encoding schemes for genomic data representation and feature learning from signal processing to machine learning |
topic | encoding scheme data representation feature learning deep learning genomic signal processing machine learning genome analysis |
url | https://www.sciopen.com/article/10.26599/BDMA.2018.9020018 |
work_keys_str_mv | AT ningyu surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning AT zhihuali surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning AT zengyu surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning |