Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes

The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which...

Full description

Saved in:
Bibliographic Details
Main Authors: Sungkyoung Choi, Sunghwan Bae, Taesung Park
Format: Article
Language:English
Published: BioMed Central 2016-12-01
Series:Genomics & Informatics
Subjects:
Online Access:http://genominfo.org/upload/pdf/gni-14-138.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572133103894528
author Sungkyoung Choi
Sunghwan Bae
Taesung Park
author_facet Sungkyoung Choi
Sunghwan Bae
Taesung Park
author_sort Sungkyoung Choi
collection DOAJ
description The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.
format Article
id doaj-art-5d080074b3cb472ea8e164b3faab0087
institution Kabale University
issn 1598-866X
2234-0742
language English
publishDate 2016-12-01
publisher BioMed Central
record_format Article
series Genomics & Informatics
spelling doaj-art-5d080074b3cb472ea8e164b3faab00872025-02-02T11:35:27ZengBioMed CentralGenomics & Informatics1598-866X2234-07422016-12-0114413814810.5808/GI.2016.14.4.138169Risk Prediction Using Genome-Wide Association Studies on Type 2 DiabetesSungkyoung Choi0Sunghwan Bae1Taesung Park2Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.The success of genome-wide association studies (GWASs) has enabled us to improve risk assessment and provide novel genetic variants for diagnosis, prevention, and treatment. However, most variants discovered by GWASs have been reported to have very small effect sizes on complex human diseases, which has been a big hurdle in building risk prediction models. Recently, many statistical approaches based on penalized regression have been developed to solve the “large p and small n” problem. In this report, we evaluated the performance of several statistical methods for predicting a binary trait: stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN). We first built a prediction model by combining variable selection and prediction methods for type 2 diabetes using Affymetrix Genome-Wide Human SNP Array 5.0 from the Korean Association Resource project. We assessed the risk prediction performance using area under the receiver operating characteristic curve (AUC) for the internal and external validation datasets. In the internal validation, SLR-LASSO and SLR-EN tended to yield more accurate predictions than other combinations. During the external validation, the SLR-SLR and SLR-EN combinations achieved the highest AUC of 0.726. We propose these combinations as a potentially powerful risk prediction model for type 2 diabetes.http://genominfo.org/upload/pdf/gni-14-138.pdfclinical prediction rulegenome-wide association studypenalized regression modelstype 2 diabetes
spellingShingle Sungkyoung Choi
Sunghwan Bae
Taesung Park
Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
Genomics & Informatics
clinical prediction rule
genome-wide association study
penalized regression models
type 2 diabetes
title Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
title_full Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
title_fullStr Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
title_full_unstemmed Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
title_short Risk Prediction Using Genome-Wide Association Studies on Type 2 Diabetes
title_sort risk prediction using genome wide association studies on type 2 diabetes
topic clinical prediction rule
genome-wide association study
penalized regression models
type 2 diabetes
url http://genominfo.org/upload/pdf/gni-14-138.pdf
work_keys_str_mv AT sungkyoungchoi riskpredictionusinggenomewideassociationstudiesontype2diabetes
AT sunghwanbae riskpredictionusinggenomewideassociationstudiesontype2diabetes
AT taesungpark riskpredictionusinggenomewideassociationstudiesontype2diabetes