Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies

IntroductionPrimary immunodeficiencies (PIDs) are a group of rare genetic disorders characterized by dysfunction of the immune system components. Early diagnosis and treatment are essential to prevent severe or life-threatening complications. PIDs are manifested by diverse clinical symptoms, posing...

Full description

Saved in:
Bibliographic Details
Main Authors: Ekaterina S. Porfireva, Anton D. Zadorozhny, Anastasia V. Rudik, Dmitry A. Filimonov, Alexey A. Lagunin
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Immunology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fimmu.2025.1492751/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832539874452832256
author Ekaterina S. Porfireva
Anton D. Zadorozhny
Anastasia V. Rudik
Dmitry A. Filimonov
Alexey A. Lagunin
Alexey A. Lagunin
author_facet Ekaterina S. Porfireva
Anton D. Zadorozhny
Anastasia V. Rudik
Dmitry A. Filimonov
Alexey A. Lagunin
Alexey A. Lagunin
author_sort Ekaterina S. Porfireva
collection DOAJ
description IntroductionPrimary immunodeficiencies (PIDs) are a group of rare genetic disorders characterized by dysfunction of the immune system components. Early diagnosis and treatment are essential to prevent severe or life-threatening complications. PIDs are manifested by diverse clinical symptoms, posing challenges for accurate diagnosis. A key aspect of PID diagnosis is identifying specific amino acid substitutions in the proteins related with heritable diseases. In this study, we have developed classification sequence-structure-property relationships (SSPR) models for predicting the pathogenicity of amino acid substitutions (AAS) in 25 proteins associated with the most important and genetically studied PIDs and encoded genes: IL2RG, JAK3, RAG1, RAG2, ADA, DCLRE1C, CD40LG, WAS, ATM, STAT3, KMT2D, BTK, FOXP3, AIRE, FAS, ELANE, ITGB2, CYBB, G6PD, GATA2, STAT1, IFIH1, NLRP3, MEFV, and SERPING1.MethodsThe data on 4825 pathogenic and benign AASs in the selected proteins were extracted from ClinVar and gnomAD. SSPR models were created for each protein using the MultiPASS software based on the Bayesian algorithm and different levels of MNA (Multilevel Neighborhoods of Atoms) descriptors for the representation of structural formulas of protein fragments including AAS.ResultsThe accuracy of prediction was assessed through a 5-fold cross-validation and compared to other bioinformatics tools, such as SIFT4G, Polyphen2 HDIV, FATHMM, MetaSVM, PROVEAN, ClinPred, and Alpha Missense. The best SSPR models demonstrated high accuracy, with an average ROC AUC of 0.831 ± 0.037, a Balanced accuracy of (0.763 ± 0.034), MCC (0.457 ± 0.06), and F-measure (0.623 ± 0.07) across all genes, outperforming the most popular bioinformatics tools.ConclusionsThe best created SSPR models for the prediction of pathogenicity of amino acid substitutions related with PIDs have been implemented in a freely available web application SAV-Pred (Single Amino acid Variants Predictor, http://www.way2drug.com/SAV-Pred/), which may be a useful tool for medical geneticists and clinicians. The use of SAV-Pred for some clinical cases of PIDs are provided.
format Article
id doaj-art-1a20a08f7d8e49e5bd185963ad7b7455
institution Kabale University
issn 1664-3224
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Immunology
spelling doaj-art-1a20a08f7d8e49e5bd185963ad7b74552025-02-05T07:32:48ZengFrontiers Media S.A.Frontiers in Immunology1664-32242025-02-011610.3389/fimmu.2025.14927511492751Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficienciesEkaterina S. Porfireva0Anton D. Zadorozhny1Anastasia V. Rudik2Dmitry A. Filimonov3Alexey A. Lagunin4Alexey A. Lagunin5Department of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, RussiaDepartment of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, RussiaLaboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, RussiaLaboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, RussiaDepartment of Bioinformatics, Pirogov Russian National Research Medical University, Moscow, RussiaLaboratory of Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Moscow, RussiaIntroductionPrimary immunodeficiencies (PIDs) are a group of rare genetic disorders characterized by dysfunction of the immune system components. Early diagnosis and treatment are essential to prevent severe or life-threatening complications. PIDs are manifested by diverse clinical symptoms, posing challenges for accurate diagnosis. A key aspect of PID diagnosis is identifying specific amino acid substitutions in the proteins related with heritable diseases. In this study, we have developed classification sequence-structure-property relationships (SSPR) models for predicting the pathogenicity of amino acid substitutions (AAS) in 25 proteins associated with the most important and genetically studied PIDs and encoded genes: IL2RG, JAK3, RAG1, RAG2, ADA, DCLRE1C, CD40LG, WAS, ATM, STAT3, KMT2D, BTK, FOXP3, AIRE, FAS, ELANE, ITGB2, CYBB, G6PD, GATA2, STAT1, IFIH1, NLRP3, MEFV, and SERPING1.MethodsThe data on 4825 pathogenic and benign AASs in the selected proteins were extracted from ClinVar and gnomAD. SSPR models were created for each protein using the MultiPASS software based on the Bayesian algorithm and different levels of MNA (Multilevel Neighborhoods of Atoms) descriptors for the representation of structural formulas of protein fragments including AAS.ResultsThe accuracy of prediction was assessed through a 5-fold cross-validation and compared to other bioinformatics tools, such as SIFT4G, Polyphen2 HDIV, FATHMM, MetaSVM, PROVEAN, ClinPred, and Alpha Missense. The best SSPR models demonstrated high accuracy, with an average ROC AUC of 0.831 ± 0.037, a Balanced accuracy of (0.763 ± 0.034), MCC (0.457 ± 0.06), and F-measure (0.623 ± 0.07) across all genes, outperforming the most popular bioinformatics tools.ConclusionsThe best created SSPR models for the prediction of pathogenicity of amino acid substitutions related with PIDs have been implemented in a freely available web application SAV-Pred (Single Amino acid Variants Predictor, http://www.way2drug.com/SAV-Pred/), which may be a useful tool for medical geneticists and clinicians. The use of SAV-Pred for some clinical cases of PIDs are provided.https://www.frontiersin.org/articles/10.3389/fimmu.2025.1492751/fullprimary immunodeficienciesamino acid substitutionspathogenicity predictionsequence-structure-property relationshipshuman genetic variationSAV-Pred
spellingShingle Ekaterina S. Porfireva
Anton D. Zadorozhny
Anastasia V. Rudik
Dmitry A. Filimonov
Alexey A. Lagunin
Alexey A. Lagunin
Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
Frontiers in Immunology
primary immunodeficiencies
amino acid substitutions
pathogenicity prediction
sequence-structure-property relationships
human genetic variation
SAV-Pred
title Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
title_full Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
title_fullStr Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
title_full_unstemmed Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
title_short Sequence-structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
title_sort sequence structure based prediction of pathogenicity for amino acid substitutions in proteins associated with primary immunodeficiencies
topic primary immunodeficiencies
amino acid substitutions
pathogenicity prediction
sequence-structure-property relationships
human genetic variation
SAV-Pred
url https://www.frontiersin.org/articles/10.3389/fimmu.2025.1492751/full
work_keys_str_mv AT ekaterinasporfireva sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies
AT antondzadorozhny sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies
AT anastasiavrudik sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies
AT dmitryafilimonov sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies
AT alexeyalagunin sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies
AT alexeyalagunin sequencestructurebasedpredictionofpathogenicityforaminoacidsubstitutionsinproteinsassociatedwithprimaryimmunodeficiencies