RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models

Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerat...

Full description

Saved in:
Bibliographic Details
Main Authors: Muhammad Nabeel Asim, Muhammad Ali Ibrahim, Tayyaba Asif, Andreas Dengel
Format: Article
Language:English
Published: Elsevier 2025-01-01
Series:Heliyon
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2405844024175196
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832573144742756352
author Muhammad Nabeel Asim
Muhammad Ali Ibrahim
Tayyaba Asif
Andreas Dengel
author_facet Muhammad Nabeel Asim
Muhammad Ali Ibrahim
Tayyaba Asif
Andreas Dengel
author_sort Muhammad Nabeel Asim
collection DOAJ
description Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
format Article
id doaj-art-57f283c7c36e4935b3ef0fd7c2c4f579
institution Kabale University
issn 2405-8440
language English
publishDate 2025-01-01
publisher Elsevier
record_format Article
series Heliyon
spelling doaj-art-57f283c7c36e4935b3ef0fd7c2c4f5792025-02-02T05:27:49ZengElsevierHeliyon2405-84402025-01-01112e41488RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language modelsMuhammad Nabeel Asim0Muhammad Ali Ibrahim1Tayyaba Asif2Andreas Dengel3German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany; Corresponding author.German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, GermanyDepartment of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, GermanyGerman Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, GermanyDeciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.http://www.sciencedirect.com/science/article/pii/S2405844024175196RNA sequence analysisDeep learningMachine learningArtificial intelligenceAI applications in genomicsMulti-omics
spellingShingle Muhammad Nabeel Asim
Muhammad Ali Ibrahim
Tayyaba Asif
Andreas Dengel
RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
Heliyon
RNA sequence analysis
Deep learning
Machine learning
Artificial intelligence
AI applications in genomics
Multi-omics
title RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
title_full RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
title_fullStr RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
title_full_unstemmed RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
title_short RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models
title_sort rna sequence analysis landscape a comprehensive review of task types databases datasets word embedding methods and language models
topic RNA sequence analysis
Deep learning
Machine learning
Artificial intelligence
AI applications in genomics
Multi-omics
url http://www.sciencedirect.com/science/article/pii/S2405844024175196
work_keys_str_mv AT muhammadnabeelasim rnasequenceanalysislandscapeacomprehensivereviewoftasktypesdatabasesdatasetswordembeddingmethodsandlanguagemodels
AT muhammadaliibrahim rnasequenceanalysislandscapeacomprehensivereviewoftasktypesdatabasesdatasetswordembeddingmethodsandlanguagemodels
AT tayyabaasif rnasequenceanalysislandscapeacomprehensivereviewoftasktypesdatabasesdatasetswordembeddingmethodsandlanguagemodels
AT andreasdengel rnasequenceanalysislandscapeacomprehensivereviewoftasktypesdatabasesdatasetswordembeddingmethodsandlanguagemodels