Skip to main content

Significance Thresholds for Rare Variant Signals

  • Chapter
  • 988 Accesses

Abstract

With the advent of large-scale DNA sequencing studies, it is worth considering how to estimate the required significance threshold for tests of association between the resulting genetic variation and a phenotype of interest. Due to the rarity of most of the identified variants, standard analytic practice now includes, in addition to single-variant tests, a new set of statistical tests that consider simultaneously all genetic variability in a small chosen region of the genome. However, the question of how to set appropriate genome-wide significance thresholds for these region-based tests has received little consideration. To control the family-wise error rate, estimates of the effective number of independent tests, me, are required. Although for single-variant tests, me depends primarily on the linkage disequilibrium, for region-based tests, the choice of regions, of weights, and of test statistics will also influence me. Therefore, me will need to be estimated for each analytic plan. In this chapter, we review a recently proposed method for using the patterns of correlation between test statistics to estimate the required significance thresholds. In this approach, extrapolation from small sections of the genome to the whole genome can provide computationally feasible estimators for genome-wide significance thresholds. We also discuss other factors that may need consideration, such as exome sequencing, the use of false discovery rates for controlling type 1 errors, and region definitions that are not based on physical proximity.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Allen AS, Satten GA (2009) A novel haplotype-sharing approach for genome-wide case-control association studies implicates the calpastatin gene in Parkinson’s disease. Genet Epidemiol 33(8):657–667

    Article  PubMed  PubMed Central  Google Scholar 

  • Awadalla P et al (2010) Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am J Hum Genet 87(3):316–324

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57(1):289–300

    Google Scholar 

  • Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Statist 29(4):1165–1188

    Article  Google Scholar 

  • Brisbin A et al (2012) Localization of association signal from risk and protective variants in sequencing studies. Front Genet 3:173

    Article  PubMed  PubMed Central  Google Scholar 

  • Browning BL, Browning SR (2011) A fast, powerful method for detecting identity by descent. Am J Hum Genet 88(2):173–182

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Browning SR, Thompson EA (2012) Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 190(4):1521–1531

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen Z, Liu Q (2011) A new approach to account for the correlations among single nucleotide polymorphisms in genome: wide association studies. Hum Hered 72(1):1–9

    Article  PubMed  PubMed Central  Google Scholar 

  • Cheverud JM (2001) A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 87(Pt 1):52–58

    Article  CAS  Google Scholar 

  • Do R, Kathiresan S, Abecasis GR (2012) Exome sequencing and complex disease: practical aspects of rare variant association studies. Hum Mol Genet 21(R1):R1–R9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genome wide association scans. Genet Epidemiol 32(3):227–234

    Article  PubMed  PubMed Central  Google Scholar 

  • Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statist Sci 18(1):71–103

    Article  Google Scholar 

  • Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103

    Article  CAS  Google Scholar 

  • Fier H et al (2012) ‘Location, Location, Location’: a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate. Bioinformatics 28(23):3027–3033

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32(4):361–369

    Article  PubMed  Google Scholar 

  • Gao X et al (2010) Avoiding the high Bonferroni penalty in genome-wide association studies. Genet Epidemiol 34(1):100–105

    PubMed  PubMed Central  Google Scholar 

  • Girard SL, Dion PA, Rouleau GA (2012) Schizophrenia genetics: putting all the pieces together. Curr Neurol Neurosci Rep 12(3):261–266

    Article  CAS  PubMed  Google Scholar 

  • Greenwood CM, Rangrej J, Sun L (2007) Optimal selection of markers for validation or replication from genome-wide association studies. Genet Epidemiol 31(5):396–407

    Article  PubMed  Google Scholar 

  • Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Statist Med 9(7):811–818

    Article  CAS  Google Scholar 

  • Labuda M et al (1996) Linkage disequilibrium analysis in young populations: pseudo-vitamin D-deficiency rickets and the founder effect in French Canadians. Am J Hum Genet 59(3):633–643

    PubMed  PubMed Central  CAS  Google Scholar 

  • Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199

    PubMed  PubMed Central  CAS  Google Scholar 

  • Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95(3):221–227

    Article  CAS  PubMed  Google Scholar 

  • Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311–321

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li MX et al (2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 131(5):747–756

    Article  CAS  PubMed  Google Scholar 

  • Lin DY, Tang ZZ (2011) A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89(3):354–367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mao X et al (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37(1):38–47

    Article  PubMed  Google Scholar 

  • Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32(6):567–573

    Article  PubMed  Google Scholar 

  • Neath AA, Cavanaugh JE (2006) A Bayesian approach to the multiple comparisons problem. J Data Sci 4:131–146

    Google Scholar 

  • Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Price AL et al (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11(7):459–463

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Reich DE et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199–204

    Article  CAS  PubMed  Google Scholar 

  • Roeder K et al (2006) Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78(2):243–252

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Schwartzman A, Lin X (2011) The effect of correlation in false discovery rate estimation. Biometrika 98(1):199–214

    Article  PubMed  PubMed Central  Google Scholar 

  • Sham PC, Cherny SS, Purcell S (2009) Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci. Genetica 136(2):237–243

    Article  CAS  PubMed  Google Scholar 

  • Šidák Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62(1):626–633

    Google Scholar 

  • Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10(10):681–690

    Article  CAS  PubMed  Google Scholar 

  • Stingo FC et al (2011) Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Statist 5(3):1978–2002

    Article  Google Scholar 

  • Sun L et al (2006) Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol 30(6):519–530

    Article  PubMed  Google Scholar 

  • Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Uhlenback GE, Ornstein LS (1930) On the theory of the Brownian motion. Phys Rev 36(5):823–841

    Article  Google Scholar 

  • Williams FM et al (2012) Genes contributing to pain sensitivity in the normal population: an exome sequencing study. PLoS Genet 8(12):e1003095

    Article  PubMed  PubMed Central  Google Scholar 

  • Wu MC et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xu C et al (2012) Multiple regression methods show great potential for rare variant association tests. PLoS One 7(8):e41694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xu C et al (2014a) Estimating genome-wide significance for whole genome sequencing studies. Genet Epidemiol 38(4):281–290. doi:10.1002/gepi.21797

    Article  PubMed  PubMed Central  Google Scholar 

  • Xu C et al (2014b) Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 5(11):1–13

    Google Scholar 

  • Yi N, Zhi D (2011) Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 35(1):57–69

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang Y, Guan W, Pan W (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37(1):99–109

    Article  PubMed  Google Scholar 

  • Zhou H et al (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19):2375–2382

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Zuk O et al (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci U S A 111:E455–E464

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

The authors are supported by CIHR operating grant MOP-115110 to CG and AC and also by MITACS, the Mathematics of Information Technology and Complex Systems, part of the Canadian Networks of Centres of Excellence program. This study makes use of data generated by the UK10K Consortium, derived from samples from UK10K_COHORTS_TWINSUK (The TwinsUK Cohort) and UK10K_COHORT_ALSPAC (the Avon Longitudinal Study of Parents and Children). A full list of the investigators who contributed to the generation of the data is available from www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Celia M. T. Greenwood .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this chapter

Cite this chapter

Greenwood, C.M.T., Xu, C., Ciampi, A. (2015). Significance Thresholds for Rare Variant Signals. In: Zeggini, E., Morris, A. (eds) Assessing Rare Variation in Complex Traits. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2824-8_12

Download citation

Publish with us

Policies and ethics