PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost
Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2021-01-01
|
Series: | Journal of Chemistry |
Online Access: | http://dx.doi.org/10.1155/2021/6256021 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832566422061973504 |
---|---|
author | Fei Guo Zhixiang Yin Kai Zhou Jiasi Li |
author_facet | Fei Guo Zhixiang Yin Kai Zhou Jiasi Li |
author_sort | Fei Guo |
collection | DOAJ |
description | Long noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency. |
format | Article |
id | doaj-art-97bfb4d8ddf042dca7a46fe23e0e47ca |
institution | Kabale University |
issn | 2090-9071 |
language | English |
publishDate | 2021-01-01 |
publisher | Wiley |
record_format | Article |
series | Journal of Chemistry |
spelling | doaj-art-97bfb4d8ddf042dca7a46fe23e0e47ca2025-02-03T01:04:11ZengWileyJournal of Chemistry2090-90712021-01-01202110.1155/2021/6256021PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoostFei Guo0Zhixiang Yin1Kai Zhou2Jiasi Li3School of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsSchool of Mathematics Physics and StatisticsLong noncoding RNAs (lncRNAs) are a class of RNAs longer than 200 nt and cannot encode the protein. Studies have shown that lncRNAs can regulate gene expression at the epigenetic, transcriptional, and posttranscriptional levels, which are not only closely related to the occurrence, development, and prevention of human diseases, but also can regulate plant flowering and participate in plant abiotic stress responses such as drought and salt. Therefore, how to accurately and efficiently identify lncRNAs is still an essential job of relevant researches. There have been a large number of identification tools based on machine-learning and deep learning algorithms, mostly using human and mouse gene sequences as training sets, seldom plants, and only using one or one class of feature selection methods after feature extraction. We developed an identification model containing dicot, monocot, algae, moss, and fern. After comparing 20 feature selection methods (seven filter and thirteen wrapper methods) combined with seven classifiers, respectively, considering the correlation between features and model redundancy at the same time, we found that the WOA-XGBoost-based model had better performance with 91.55%, 96.78%, and 91.68% of accuracy, AUC, and F1_score. Meanwhile, the number of elements in the feature subset was reduced to 23, which effectively improved the prediction accuracy and modeling efficiency.http://dx.doi.org/10.1155/2021/6256021 |
spellingShingle | Fei Guo Zhixiang Yin Kai Zhou Jiasi Li PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost Journal of Chemistry |
title | PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost |
title_full | PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost |
title_fullStr | PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost |
title_full_unstemmed | PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost |
title_short | PLncWX: A Machine-Learning Algorithm for Plant lncRNA Identification Based on WOA-XGBoost |
title_sort | plncwx a machine learning algorithm for plant lncrna identification based on woa xgboost |
url | http://dx.doi.org/10.1155/2021/6256021 |
work_keys_str_mv | AT feiguo plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost AT zhixiangyin plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost AT kaizhou plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost AT jiasili plncwxamachinelearningalgorithmforplantlncrnaidentificationbasedonwoaxgboost |