Significance Thresholds for Rare Variant Signals

Greenwood, Celia M. T.; Xu, ChangJiang; Ciampi, Antonio

doi:10.1007/978-1-4939-2824-8_12

Significance Thresholds for Rare Variant Signals

Celia M. T. Greenwood^3,4,
ChangJiang Xu^3,5 &
Antonio Ciampi⁵

Chapter

988 Accesses

Abstract

With the advent of large-scale DNA sequencing studies, it is worth considering how to estimate the required significance threshold for tests of association between the resulting genetic variation and a phenotype of interest. Due to the rarity of most of the identified variants, standard analytic practice now includes, in addition to single-variant tests, a new set of statistical tests that consider simultaneously all genetic variability in a small chosen region of the genome. However, the question of how to set appropriate genome-wide significance thresholds for these region-based tests has received little consideration. To control the family-wise error rate, estimates of the effective number of independent tests, m_e, are required. Although for single-variant tests, m_e depends primarily on the linkage disequilibrium, for region-based tests, the choice of regions, of weights, and of test statistics will also influence m_e. Therefore, m_e will need to be estimated for each analytic plan. In this chapter, we review a recently proposed method for using the patterns of correlation between test statistics to estimate the required significance thresholds. In this approach, extrapolation from small sections of the genome to the whole genome can provide computationally feasible estimators for genome-wide significance thresholds. We also discuss other factors that may need consideration, such as exome sequencing, the use of false discovery rates for controlling type 1 errors, and region definitions that are not based on physical proximity.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Allen AS, Satten GA (2009) A novel haplotype-sharing approach for genome-wide case-control association studies implicates the calpastatin gene in Parkinson’s disease. Genet Epidemiol 33(8):657–667
Article PubMed PubMed Central Google Scholar
Awadalla P et al (2010) Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am J Hum Genet 87(3):316–324
Article CAS PubMed PubMed Central Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57(1):289–300
Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Statist 29(4):1165–1188
Article Google Scholar
Brisbin A et al (2012) Localization of association signal from risk and protective variants in sequencing studies. Front Genet 3:173
Article PubMed PubMed Central Google Scholar
Browning BL, Browning SR (2011) A fast, powerful method for detecting identity by descent. Am J Hum Genet 88(2):173–182
Article CAS PubMed PubMed Central Google Scholar
Browning SR, Thompson EA (2012) Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 190(4):1521–1531
Article PubMed PubMed Central Google Scholar
Chen Z, Liu Q (2011) A new approach to account for the correlations among single nucleotide polymorphisms in genome: wide association studies. Hum Hered 72(1):1–9
Article PubMed PubMed Central Google Scholar
Cheverud JM (2001) A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 87(Pt 1):52–58
Article CAS Google Scholar
Do R, Kathiresan S, Abecasis GR (2012) Exome sequencing and complex disease: practical aspects of rare variant association studies. Hum Mol Genet 21(R1):R1–R9
Article CAS PubMed PubMed Central Google Scholar
Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genome wide association scans. Genet Epidemiol 32(3):227–234
Article PubMed PubMed Central Google Scholar
Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statist Sci 18(1):71–103
Article Google Scholar
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103
Article CAS Google Scholar
Fier H et al (2012) ‘Location, Location, Location’: a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate. Bioinformatics 28(23):3027–3033
Article CAS PubMed PubMed Central Google Scholar
Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32(4):361–369
Article PubMed Google Scholar
Gao X et al (2010) Avoiding the high Bonferroni penalty in genome-wide association studies. Genet Epidemiol 34(1):100–105
PubMed PubMed Central Google Scholar
Girard SL, Dion PA, Rouleau GA (2012) Schizophrenia genetics: putting all the pieces together. Curr Neurol Neurosci Rep 12(3):261–266
Article CAS PubMed Google Scholar
Greenwood CM, Rangrej J, Sun L (2007) Optimal selection of markers for validation or replication from genome-wide association studies. Genet Epidemiol 31(5):396–407
Article PubMed Google Scholar
Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Statist Med 9(7):811–818
Article CAS Google Scholar
Labuda M et al (1996) Linkage disequilibrium analysis in young populations: pseudo-vitamin D-deficiency rickets and the founder effect in French Canadians. Am J Hum Genet 59(3):633–643
PubMed PubMed Central CAS Google Scholar
Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199
PubMed PubMed Central CAS Google Scholar
Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95(3):221–227
Article CAS PubMed Google Scholar
Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311–321
Article CAS PubMed PubMed Central Google Scholar
Li MX et al (2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 131(5):747–756
Article CAS PubMed Google Scholar
Lin DY, Tang ZZ (2011) A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89(3):354–367
Article CAS PubMed PubMed Central Google Scholar
Mao X et al (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37(1):38–47
Article PubMed Google Scholar
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32(6):567–573
Article PubMed Google Scholar
Neath AA, Cavanaugh JE (2006) A Bayesian approach to the multiple comparisons problem. J Data Sci 4:131–146
Google Scholar
Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190
Article CAS PubMed PubMed Central Google Scholar
Price AL et al (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11(7):459–463
Article CAS PubMed PubMed Central Google Scholar
Reich DE et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199–204
Article CAS PubMed Google Scholar
Roeder K et al (2006) Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78(2):243–252
Article CAS PubMed PubMed Central Google Scholar
Schwartzman A, Lin X (2011) The effect of correlation in false discovery rate estimation. Biometrika 98(1):199–214
Article PubMed PubMed Central Google Scholar
Sham PC, Cherny SS, Purcell S (2009) Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci. Genetica 136(2):237–243
Article CAS PubMed Google Scholar
Šidák Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62(1):626–633
Google Scholar
Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10(10):681–690
Article CAS PubMed Google Scholar
Stingo FC et al (2011) Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Statist 5(3):1978–2002
Article Google Scholar
Sun L et al (2006) Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol 30(6):519–530
Article PubMed Google Scholar
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121
Article CAS PubMed PubMed Central Google Scholar
Uhlenback GE, Ornstein LS (1930) On the theory of the Brownian motion. Phys Rev 36(5):823–841
Article Google Scholar
Williams FM et al (2012) Genes contributing to pain sensitivity in the normal population: an exome sequencing study. PLoS Genet 8(12):e1003095
Article PubMed PubMed Central Google Scholar
Wu MC et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93
Article CAS PubMed PubMed Central Google Scholar
Xu C et al (2012) Multiple regression methods show great potential for rare variant association tests. PLoS One 7(8):e41694
Article CAS PubMed PubMed Central Google Scholar
Xu C et al (2014a) Estimating genome-wide significance for whole genome sequencing studies. Genet Epidemiol 38(4):281–290. doi:10.1002/gepi.21797
Article PubMed PubMed Central Google Scholar
Xu C et al (2014b) Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 5(11):1–13
Google Scholar
Yi N, Zhi D (2011) Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 35(1):57–69
Article PubMed PubMed Central Google Scholar
Zhang Y, Guan W, Pan W (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37(1):99–109
Article PubMed Google Scholar
Zhou H et al (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19):2375–2382
Article CAS PubMed PubMed Central Google Scholar
Zuk O et al (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci U S A 111:E455–E464
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

The authors are supported by CIHR operating grant MOP-115110 to CG and AC and also by MITACS, the Mathematics of Information Technology and Complex Systems, part of the Canadian Networks of Centres of Excellence program. This study makes use of data generated by the UK10K Consortium, derived from samples from UK10K_COHORTS_TWINSUK (The TwinsUK Cohort) and UK10K_COHORT_ALSPAC (the Avon Longitudinal Study of Parents and Children). A full list of the investigators who contributed to the generation of the data is available from www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310.

Author information

Authors and Affiliations

Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Côte Sainte Catherine, Montreal, QC, Canada, H3T 1E2
Celia M. T. Greenwood & ChangJiang Xu
Departments of Oncology, Epidemiology, Biostatistics and Occupational Health, and Human Genetics, McGill University, Montreal, QC, Canada
Celia M. T. Greenwood
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
ChangJiang Xu & Antonio Ciampi

Authors

Celia M. T. Greenwood
View author publications
You can also search for this author in PubMed Google Scholar
ChangJiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Ciampi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Celia M. T. Greenwood .

Editor information

Editors and Affiliations

Wellcome Trust Sanger Institute, Hinxton, UK
Eleftheria Zeggini Ph.D.
Department of Biostatistics, University of Liverpool, Liverpool, UK
Andrew Morris Ph.D.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Greenwood, C.M.T., Xu, C., Ciampi, A. (2015). Significance Thresholds for Rare Variant Signals. In: Zeggini, E., Morris, A. (eds) Assessing Rare Variation in Complex Traits. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2824-8_12

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2824-8_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2823-1
Online ISBN: 978-1-4939-2824-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics