Abstract
With the advent of large-scale DNA sequencing studies, it is worth considering how to estimate the required significance threshold for tests of association between the resulting genetic variation and a phenotype of interest. Due to the rarity of most of the identified variants, standard analytic practice now includes, in addition to single-variant tests, a new set of statistical tests that consider simultaneously all genetic variability in a small chosen region of the genome. However, the question of how to set appropriate genome-wide significance thresholds for these region-based tests has received little consideration. To control the family-wise error rate, estimates of the effective number of independent tests, me, are required. Although for single-variant tests, me depends primarily on the linkage disequilibrium, for region-based tests, the choice of regions, of weights, and of test statistics will also influence me. Therefore, me will need to be estimated for each analytic plan. In this chapter, we review a recently proposed method for using the patterns of correlation between test statistics to estimate the required significance thresholds. In this approach, extrapolation from small sections of the genome to the whole genome can provide computationally feasible estimators for genome-wide significance thresholds. We also discuss other factors that may need consideration, such as exome sequencing, the use of false discovery rates for controlling type 1 errors, and region definitions that are not based on physical proximity.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Allen AS, Satten GA (2009) A novel haplotype-sharing approach for genome-wide case-control association studies implicates the calpastatin gene in Parkinson’s disease. Genet Epidemiol 33(8):657–667
Awadalla P et al (2010) Direct measure of the de novo mutation rate in autism and schizophrenia cohorts. Am J Hum Genet 87(3):316–324
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Statist Soc Ser B 57(1):289–300
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Statist 29(4):1165–1188
Brisbin A et al (2012) Localization of association signal from risk and protective variants in sequencing studies. Front Genet 3:173
Browning BL, Browning SR (2011) A fast, powerful method for detecting identity by descent. Am J Hum Genet 88(2):173–182
Browning SR, Thompson EA (2012) Detecting rare variant associations by identity-by-descent mapping in case-control studies. Genetics 190(4):1521–1531
Chen Z, Liu Q (2011) A new approach to account for the correlations among single nucleotide polymorphisms in genome: wide association studies. Hum Hered 72(1):1–9
Cheverud JM (2001) A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 87(Pt 1):52–58
Do R, Kathiresan S, Abecasis GR (2012) Exome sequencing and complex disease: practical aspects of rare variant association studies. Hum Mol Genet 21(R1):R1–R9
Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genome wide association scans. Genet Epidemiol 32(3):227–234
Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statist Sci 18(1):71–103
Efron B (2007) Correlation and large-scale simultaneous significance testing. J Am Stat Assoc 102(477):93–103
Fier H et al (2012) ‘Location, Location, Location’: a spatial approach for rare variant analysis and an application to a study on non-syndromic cleft lip with or without cleft palate. Bioinformatics 28(23):3027–3033
Gao X, Starmer J, Martin ER (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol 32(4):361–369
Gao X et al (2010) Avoiding the high Bonferroni penalty in genome-wide association studies. Genet Epidemiol 34(1):100–105
Girard SL, Dion PA, Rouleau GA (2012) Schizophrenia genetics: putting all the pieces together. Curr Neurol Neurosci Rep 12(3):261–266
Greenwood CM, Rangrej J, Sun L (2007) Optimal selection of markers for validation or replication from genome-wide association studies. Genet Epidemiol 31(5):396–407
Hochberg Y, Benjamini Y (1990) More powerful procedures for multiple significance testing. Statist Med 9(7):811–818
Labuda M et al (1996) Linkage disequilibrium analysis in young populations: pseudo-vitamin D-deficiency rickets and the founder effect in French Canadians. Am J Hum Genet 59(3):633–643
Lander ES, Botstein D (1989) Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121(1):185–199
Li J, Ji L (2005) Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95(3):221–227
Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311–321
Li MX et al (2012) Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum Genet 131(5):747–756
Lin DY, Tang ZZ (2011) A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89(3):354–367
Mao X et al (2013) Testing genetic association with rare variants in admixed populations. Genet Epidemiol 37(1):38–47
Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32(6):567–573
Neath AA, Cavanaugh JE (2006) A Bayesian approach to the multiple comparisons problem. J Data Sci 4:131–146
Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2(12):e190
Price AL et al (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11(7):459–463
Reich DE et al (2001) Linkage disequilibrium in the human genome. Nature 411(6834):199–204
Roeder K et al (2006) Using linkage genome scans to improve power of association in genome scans. Am J Hum Genet 78(2):243–252
Schwartzman A, Lin X (2011) The effect of correlation in false discovery rate estimation. Biometrika 98(1):199–214
Sham PC, Cherny SS, Purcell S (2009) Application of genome-wide SNP data for uncovering pairwise relationships and quantitative trait loci. Genetica 136(2):237–243
Šidák Z (1967) Rectangular confidence regions for the means of multivariate normal distributions. J Am Stat Assoc 62(1):626–633
Stephens M, Balding DJ (2009) Bayesian statistical methods for genetic association studies. Nat Rev Genet 10(10):681–690
Stingo FC et al (2011) Incorporating biological information into linear models: a Bayesian approach to the selection of pathways and genes. Ann Appl Statist 5(3):1978–2002
Sun L et al (2006) Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol 30(6):519–530
Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98(9):5116–5121
Uhlenback GE, Ornstein LS (1930) On the theory of the Brownian motion. Phys Rev 36(5):823–841
Williams FM et al (2012) Genes contributing to pain sensitivity in the normal population: an exome sequencing study. PLoS Genet 8(12):e1003095
Wu MC et al (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93
Xu C et al (2012) Multiple regression methods show great potential for rare variant association tests. PLoS One 7(8):e41694
Xu C et al (2014a) Estimating genome-wide significance for whole genome sequencing studies. Genet Epidemiol 38(4):281–290. doi:10.1002/gepi.21797
Xu C et al (2014b) Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 5(11):1–13
Yi N, Zhi D (2011) Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 35(1):57–69
Zhang Y, Guan W, Pan W (2013) Adjustment for population stratification via principal components in association analysis of rare variants. Genet Epidemiol 37(1):99–109
Zhou H et al (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19):2375–2382
Zuk O et al (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci U S A 111:E455–E464
Acknowledgments
The authors are supported by CIHR operating grant MOP-115110 to CG and AC and also by MITACS, the Mathematics of Information Technology and Complex Systems, part of the Canadian Networks of Centres of Excellence program. This study makes use of data generated by the UK10K Consortium, derived from samples from UK10K_COHORTS_TWINSUK (The TwinsUK Cohort) and UK10K_COHORT_ALSPAC (the Avon Longitudinal Study of Parents and Children). A full list of the investigators who contributed to the generation of the data is available from www.UK10K.org. Funding for UK10K was provided by the Wellcome Trust under award WT091310.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Greenwood, C.M.T., Xu, C., Ciampi, A. (2015). Significance Thresholds for Rare Variant Signals. In: Zeggini, E., Morris, A. (eds) Assessing Rare Variation in Complex Traits. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2824-8_12
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2824-8_12
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2823-1
Online ISBN: 978-1-4939-2824-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)