Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Viruses |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4915/17/1/12 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587353549438976 |
---|---|
author | Aiden M. C. Pike Saeed Amal Melissa S. Maginnis Michael P. Wilczek |
author_facet | Aiden M. C. Pike Saeed Amal Melissa S. Maginnis Michael P. Wilczek |
author_sort | Aiden M. C. Pike |
collection | DOAJ |
description | JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations. Many online sequence entries, including those on NCBI databases, lack sufficient sample information, limiting large-scale analyses of NCCR sequences. Machine learning techniques, however, can augment available data for analysis. This study employs a previously compiled dataset of 989 JCPyV NCCR sequences from GenBank with associated patient PML status and viral tissue source to train multilayer perceptrons for predicting missing information within the dataset. The PML status and tissue source models were 100% and 87.8% accurate, respectively. Within the dataset, 348 samples had an unconfirmed PML status, where 259 were predicted as No PML and 89 as PML sequences. Of the 63 sequences with unconfirmed tissue sources, eight samples were predicted as urine, 13 as blood, and 42 as cerebrospinal fluid. These models can improve viral sequence identification and provide insights into viral mutations and pathogenesis. |
format | Article |
id | doaj-art-5578aab113e34c6e8c1038c39d180de3 |
institution | Kabale University |
issn | 1999-4915 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Viruses |
spelling | doaj-art-5578aab113e34c6e8c1038c39d180de32025-01-24T13:52:15ZengMDPI AGViruses1999-49152024-12-011711210.3390/v17010012Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral GenomeAiden M. C. Pike0Saeed Amal1Melissa S. Maginnis2Michael P. Wilczek3Maine Space Grant Consortium, Augusta, ME 04330, USAThe Roux Institute, Northeastern University, Portland, ME 04101, USADepartment of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USALife Sciences, Health, and Engineering Department, The Roux Institute, Northeastern University, Portland, ME 04101, USAJC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations. Many online sequence entries, including those on NCBI databases, lack sufficient sample information, limiting large-scale analyses of NCCR sequences. Machine learning techniques, however, can augment available data for analysis. This study employs a previously compiled dataset of 989 JCPyV NCCR sequences from GenBank with associated patient PML status and viral tissue source to train multilayer perceptrons for predicting missing information within the dataset. The PML status and tissue source models were 100% and 87.8% accurate, respectively. Within the dataset, 348 samples had an unconfirmed PML status, where 259 were predicted as No PML and 89 as PML sequences. Of the 63 sequences with unconfirmed tissue sources, eight samples were predicted as urine, 13 as blood, and 42 as cerebrospinal fluid. These models can improve viral sequence identification and provide insights into viral mutations and pathogenesis.https://www.mdpi.com/1999-4915/17/1/12JC polyomavirusnon-coding control region<i>k</i>-mermachine learningneural networkmultilayer perceptron |
spellingShingle | Aiden M. C. Pike Saeed Amal Melissa S. Maginnis Michael P. Wilczek Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome Viruses JC polyomavirus non-coding control region <i>k</i>-mer machine learning neural network multilayer perceptron |
title | Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome |
title_full | Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome |
title_fullStr | Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome |
title_full_unstemmed | Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome |
title_short | Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome |
title_sort | evaluating neural network performance in predicting disease status and tissue source of jc polyomavirus from patient isolates based on the hypervariable region of the viral genome |
topic | JC polyomavirus non-coding control region <i>k</i>-mer machine learning neural network multilayer perceptron |
url | https://www.mdpi.com/1999-4915/17/1/12 |
work_keys_str_mv | AT aidenmcpike evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome AT saeedamal evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome AT melissasmaginnis evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome AT michaelpwilczek evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome |