Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.

Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an es...

Full description

Saved in:
Bibliographic Details
Main Authors: Laura L Faye, Mitchell J Machiela, Peter Kraft, Shelley B Bull, Lei Sun
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2013-01-01
Series:PLoS Genetics
Online Access:https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003609&type=printable
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850161474320728064
author Laura L Faye
Mitchell J Machiela
Peter Kraft
Shelley B Bull
Lei Sun
author_facet Laura L Faye
Mitchell J Machiela
Peter Kraft
Shelley B Bull
Lei Sun
author_sort Laura L Faye
collection DOAJ
description Next generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.
format Article
id doaj-art-dfcc4c906b7f4e4e9fb6d013a7c657a1
institution OA Journals
issn 1553-7390
1553-7404
language English
publishDate 2013-01-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Genetics
spelling doaj-art-dfcc4c906b7f4e4e9fb6d013a7c657a12025-08-20T02:22:49ZengPublic Library of Science (PLoS)PLoS Genetics1553-73901553-74042013-01-0198e100360910.1371/journal.pgen.1003609Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.Laura L FayeMitchell J MachielaPeter KraftShelley B BullLei SunNext generation sequencing has dramatically increased our ability to localize disease-causing variants by providing base-pair level information at costs increasingly feasible for the large sample sizes required to detect complex-trait associations. Yet, identification of causal variants within an established region of association remains a challenge. Counter-intuitively, certain factors that increase power to detect an associated region can decrease power to localize the causal variant. First, combining GWAS with imputation or low coverage sequencing to achieve the large sample sizes required for high power can have the unintended effect of producing differential genotyping error among SNPs. This tends to bias the relative evidence for association toward better genotyped SNPs. Second, re-use of GWAS data for fine-mapping exploits previous findings to ensure genome-wide significance in GWAS-associated regions. However, using GWAS findings to inform fine-mapping analysis can bias evidence away from the causal SNP toward the tag SNP and SNPs in high LD with the tag. Together these factors can reduce power to localize the causal SNP by more than half. Other strategies commonly employed to increase power to detect association, namely increasing sample size and using higher density genotyping arrays, can, in certain common scenarios, actually exacerbate these effects and further decrease power to localize causal variants. We develop a re-ranking procedure that accounts for these adverse effects and substantially improves the accuracy of causal SNP identification, often doubling the probability that the causal SNP is top-ranked. Application to the NCI BPC3 aggressive prostate cancer GWAS with imputation meta-analysis identified a new top SNP at 2 of 3 associated loci and several additional possible causal SNPs at these loci that may have otherwise been overlooked. This method is simple to implement using R scripts provided on the author's website.https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003609&type=printable
spellingShingle Laura L Faye
Mitchell J Machiela
Peter Kraft
Shelley B Bull
Lei Sun
Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
PLoS Genetics
title Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
title_full Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
title_fullStr Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
title_full_unstemmed Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
title_short Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification.
title_sort re ranking sequencing variants in the post gwas era for accurate causal variant identification
url https://journals.plos.org/plosgenetics/article/file?id=10.1371/journal.pgen.1003609&type=printable
work_keys_str_mv AT lauralfaye rerankingsequencingvariantsinthepostgwaseraforaccuratecausalvariantidentification
AT mitchelljmachiela rerankingsequencingvariantsinthepostgwaseraforaccuratecausalvariantidentification
AT peterkraft rerankingsequencingvariantsinthepostgwaseraforaccuratecausalvariantidentification
AT shelleybbull rerankingsequencingvariantsinthepostgwaseraforaccuratecausalvariantidentification
AT leisun rerankingsequencingvariantsinthepostgwaseraforaccuratecausalvariantidentification