Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning

Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar co...

Full description

Saved in:
Bibliographic Details
Main Authors: Ning Yu, Zhihua Li, Zeng Yu
Format: Article
Language:English
Published: Tsinghua University Press 2018-09-01
Series:Big Data Mining and Analytics
Subjects:
Online Access:https://www.sciopen.com/article/10.26599/BDMA.2018.9020018
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572923107344384
author Ning Yu
Zhihua Li
Zeng Yu
author_facet Ning Yu
Zhihua Li
Zeng Yu
author_sort Ning Yu
collection DOAJ
description Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar conversion occurs in Genomic Signal Processing (GSP), where genome sequences are transformed into numerical sequences for signal extraction and recognition. This kind of conversion is also called encoding scheme. The diverse encoding schemes can greatly affect the performance of GSP applications and machine learning models. This paper aims to collect, analyze, discuss, and summarize the existing encoding schemes of genome sequence particularly in GSP as well as other genome analysis applications to provide a comprehensive reference for the genomic data representation and feature learning in machine learning.
format Article
id doaj-art-0da5127f96ed4180b5d5ee246faff785
institution Kabale University
issn 2096-0654
language English
publishDate 2018-09-01
publisher Tsinghua University Press
record_format Article
series Big Data Mining and Analytics
spelling doaj-art-0da5127f96ed4180b5d5ee246faff7852025-02-02T06:00:35ZengTsinghua University PressBig Data Mining and Analytics2096-06542018-09-011319121010.26599/BDMA.2018.9020018Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine LearningNing Yu0Zhihua Li1Zeng Yu2<institution content-type="dept">Department of Computing Sciences, College at Brockport</institution>, <institution>State University of New York</institution>, <city>Brockport</city>, <state>NY</state> <postal-code>14422</postal-code>, <country>USA</country>.<institution content-type="dept">Department of Computer Science and Technology</institution> at <institution>Jiangnan University</institution>, <city>Wuxi</city> <postal-code>214122</postal-code>, <country>China</country>.<institution content-type="dept">School of Information Science and Technology</institution>, <institution>Southwest Jiaotong University</institution>, <city>Chengdu</city> <postal-code>611756</postal-code>, <country>China</country>.Data-driven machine learning, especially deep learning technology, is becoming an important tool for handling big data issues in bioinformatics. In machine learning, DNA sequences are often converted to numerical values for data representation and feature learning in various applications. Similar conversion occurs in Genomic Signal Processing (GSP), where genome sequences are transformed into numerical sequences for signal extraction and recognition. This kind of conversion is also called encoding scheme. The diverse encoding schemes can greatly affect the performance of GSP applications and machine learning models. This paper aims to collect, analyze, discuss, and summarize the existing encoding schemes of genome sequence particularly in GSP as well as other genome analysis applications to provide a comprehensive reference for the genomic data representation and feature learning in machine learning.https://www.sciopen.com/article/10.26599/BDMA.2018.9020018encoding schemedata representationfeature learningdeep learninggenomic signal processingmachine learninggenome analysis
spellingShingle Ning Yu
Zhihua Li
Zeng Yu
Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
Big Data Mining and Analytics
encoding scheme
data representation
feature learning
deep learning
genomic signal processing
machine learning
genome analysis
title Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
title_full Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
title_fullStr Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
title_full_unstemmed Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
title_short Survey on Encoding Schemes for Genomic Data Representation and Feature Learning—From Signal Processing to Machine Learning
title_sort survey on encoding schemes for genomic data representation and feature learning from signal processing to machine learning
topic encoding scheme
data representation
feature learning
deep learning
genomic signal processing
machine learning
genome analysis
url https://www.sciopen.com/article/10.26599/BDMA.2018.9020018
work_keys_str_mv AT ningyu surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning
AT zhihuali surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning
AT zengyu surveyonencodingschemesforgenomicdatarepresentationandfeaturelearningfromsignalprocessingtomachinelearning