PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost

Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and...

Full description

Saved in:
Bibliographic Details
Main Authors: Fei Guo, Zhixiang Yin, Kai Zhou, Jiasi Li
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Journal of Chemistry
Online Access:http://dx.doi.org/10.1155/2021/6256021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832566422061973504
author Fei Guo
Zhixiang Yin
Kai Zhou
Jiasi Li
author_facet Fei Guo
Zhixiang Yin
Kai Zhou
Jiasi Li
author_sort Fei Guo
collection DOAJ
description Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.
format Article
id doaj-art-97bfb4d8ddf042dca7a46fe23e0e47ca
institution Kabale University
issn 2090-9071
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Journal of Chemistry
spelling doaj-art-97bfb4d8ddf042dca7a46fe23e0e47ca2025-02-03T01:04:11ZengWileyJournal of Chemistry2090-90712021-01-01202110.1155/2021/6256021PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoostFei Guo0Zhixiang Yin1Kai Zhou2Jiasi Li3School of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsLong noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.http://dx.doi.org/10.1155/2021/6256021
spellingShingle Fei Guo
Zhixiang Yin
Kai Zhou
Jiasi Li
PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
Journal of Chemistry
title PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
title_full PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
title_fullStr PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
title_full_unstemmed PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
title_short PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
title_sort plncwx a machine learning algorithm for plant lncrna identification based on woa xgboost
url http://dx.doi.org/10.1155/2021/6256021
work_keys_str_mv AT feiguo plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost
AT zhixiangyin plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost
AT kaizhou plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost
AT jiasili plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost