Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome

JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in...

Full description

Saved in:
Bibliographic Details
Main Authors: Aiden M. C. Pike, Saeed Amal, Melissa S. Maginnis, Michael P. Wilczek
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Viruses
Subjects:
Online Access:https://www.mdpi.com/1999-4915/17/1/12
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587353549438976
author Aiden M. C. Pike
Saeed Amal
Melissa S. Maginnis
Michael P. Wilczek
author_facet Aiden M. C. Pike
Saeed Amal
Melissa S. Maginnis
Michael P. Wilczek
author_sort Aiden M. C. Pike
collection DOAJ
description JC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations. Many online sequence entries, including those on NCBI databases, lack sufficient sample information, limiting large-scale analyses of NCCR sequences. Machine learning techniques, however, can augment available data for analysis. This study employs a previously compiled dataset of 989 JCPyV NCCR sequences from GenBank with associated patient PML status and viral tissue source to train multilayer perceptrons for predicting missing information within the dataset. The PML status and tissue source models were 100% and 87.8% accurate, respectively. Within the dataset, 348 samples had an unconfirmed PML status, where 259 were predicted as No PML and 89 as PML sequences. Of the 63 sequences with unconfirmed tissue sources, eight samples were predicted as urine, 13 as blood, and 42 as cerebrospinal fluid. These models can improve viral sequence identification and provide insights into viral mutations and pathogenesis.
format Article
id doaj-art-5578aab113e34c6e8c1038c39d180de3
institution Kabale University
issn 1999-4915
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Viruses
spelling doaj-art-5578aab113e34c6e8c1038c39d180de32025-01-24T13:52:15ZengMDPI AGViruses1999-49152024-12-011711210.3390/v17010012Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral GenomeAiden M. C. Pike0Saeed Amal1Melissa S. Maginnis2Michael P. Wilczek3Maine Space Grant Consortium, Augusta, ME 04330, USAThe Roux Institute, Northeastern University, Portland, ME 04101, USADepartment of Molecular and Biomedical Sciences, University of Maine, Orono, ME 04469, USALife Sciences, Health, and Engineering Department, The Roux Institute, Northeastern University, Portland, ME 04101, USAJC polyomavirus (JCPyV) establishes a persistent, asymptomatic kidney infection in most of the population. However, JCPyV can reactivate in immunocompromised individuals and cause progressive multifocal leukoencephalopathy (PML), a fatal demyelinating disease with no approved treatment. Mutations in the hypervariable non-coding control region (NCCR) of the JCPyV genome have been linked to disease outcomes and neuropathogenesis, yet few metanalyses document these associations. Many online sequence entries, including those on NCBI databases, lack sufficient sample information, limiting large-scale analyses of NCCR sequences. Machine learning techniques, however, can augment available data for analysis. This study employs a previously compiled dataset of 989 JCPyV NCCR sequences from GenBank with associated patient PML status and viral tissue source to train multilayer perceptrons for predicting missing information within the dataset. The PML status and tissue source models were 100% and 87.8% accurate, respectively. Within the dataset, 348 samples had an unconfirmed PML status, where 259 were predicted as No PML and 89 as PML sequences. Of the 63 sequences with unconfirmed tissue sources, eight samples were predicted as urine, 13 as blood, and 42 as cerebrospinal fluid. These models can improve viral sequence identification and provide insights into viral mutations and pathogenesis.https://www.mdpi.com/1999-4915/17/1/12JC polyomavirusnon-coding control region<i>k</i>-mermachine learningneural networkmultilayer perceptron
spellingShingle Aiden M. C. Pike
Saeed Amal
Melissa S. Maginnis
Michael P. Wilczek
Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
Viruses
JC polyomavirus
non-coding control region
<i>k</i>-mer
machine learning
neural network
multilayer perceptron
title Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
title_full Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
title_fullStr Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
title_full_unstemmed Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
title_short Evaluating Neural Network Performance in Predicting Disease Status and Tissue Source of JC Polyomavirus from Patient Isolates Based on the Hypervariable Region of the Viral Genome
title_sort evaluating neural network performance in predicting disease status and tissue source of jc polyomavirus from patient isolates based on the hypervariable region of the viral genome
topic JC polyomavirus
non-coding control region
<i>k</i>-mer
machine learning
neural network
multilayer perceptron
url https://www.mdpi.com/1999-4915/17/1/12
work_keys_str_mv AT aidenmcpike evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome
AT saeedamal evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome
AT melissasmaginnis evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome
AT michaelpwilczek evaluatingneuralnetworkperformanceinpredictingdiseasestatusandtissuesourceofjcpolyomavirusfrompatientisolatesbasedonthehypervariableregionoftheviralgenome