Abstract
Large-scale sequence-based association analysis is a powerful approach to identify rare variants involved in complex trait etiologies. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary to avoid reporting spurious results. For gene-based mapping of rare variants, where rare variants within a region are analyzed in aggregate, three replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 within the gene region are genotyped and followed up; (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested; and (3) exome-array-based replication, where the identified gene region in the stage 1 sample is followed up using exome arrays in the stage 2 sample. The efficiency of the three strategies is dependent on the proportions of causative variants discovered in stage 1, sequencing/genotyping errors, trait-specific genetic architecture, as well as how many variants within the identified gene region are available for genotyping on the exome array. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful than variant- and exome-array-based replication, although the power gain can be small. For variant-based replication, if the stage 1 sample consists of several thousands of individuals, a large fraction of causative variant sites can be observed, and even for smaller stage 1 studies, a large proportion of the locus population attributable risk can be explained by the uncovered variants. Exome-array-based replication can have comparable power to the other two approaches if coding variants driving the association are well represented. As a consequence, although sequence-based replication is usually more powerful and also valuable to identify novel potentially causal variants, both variant- and exome-array-based replication can be a viable and cost-effective approach for replicating rare variant associations.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bodmer W, Bonilla C (2008) Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet 40(6):695–701
Boyko AR, Williamson SH, Indap AR, Degenhardt JD, Hernandez RD, Lohmueller KE, Adams MD, Schmidt S, Sninsky JJ, Sunyaev SR et al (2008) Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4(5):e1000083
Browning JD, Szczepaniak LS, Dobbins R, Nuremberg P, Horton JD, Cohen JC, Grundy SM, Hobbs HH (2004) Prevalence of hepatic steatosis in an urban population in the United States: impact of ethnicity. Hepatology 40(6):1387–1395
Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, Hobbs HH (2004) Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305(5685):869–872
Cohen JC, Pertsemlidis A, Fahmi S, Esmail S, Vega GL, Grundy SM, Hobbs HH (2006) Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci U S A 103(6):1810–1815
Fu W, O’Connor TD, Jun G, Kang HM, Abecasis G, Leal SM, Gabriel S, Rieder MJ, Altshuler D, Shendure J et al (2013) Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493(7431):216–220
Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, Beeson KY, Schork NJ, Murray SS, Topol EJ, Levy S et al (2009) Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 10(3):R32
Huyghe JR, Jackson AU, Fogarty MP, Buchkovich ML, Stancakova A, Stringham HM, Sim X, Yang L, Fuchsberger C, Cederberg H et al (2013) Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nat Genet 45(2):197–201
Ji W, Foo JN, O’Roak BJ, Zhao H, Larson MG, Simon DB, Newton-Cheh C, State MW, Levy D, Lifton RP (2008) Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40(5):592–599
Kerem B, Chiba-Falek O, Kerem E (1997) Cystic fibrosis in Jews: frequency and mutation distribution. Genet Test 1(1):35–39
King MC, Rowell S, Love SM (1993) Inherited breast and ovarian cancer. What are the risks? What are the choices? JAMA 269(15):1975–1980
Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci U S A 106(10):3871–3876
Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311–321
Li B, Wang G, Leal SM (2012) SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits. Bioinformatics 28(20):2703–2704
Liu DJ, Leal SM (2010a) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6(10):e1001156
Liu DJ, Leal SM (2010b) Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 87(6):790–801
Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5(2):e1000384
Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genet Epidemiol 34(2):188–193
Nielsen R, Hellmann I, Hubisz M, Bustamante C, Clark AG (2007) Recent and ongoing selection in the human genome. Nat Rev Genet 8(11):857–868
Pritchard JK (2001) Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet 69(1):124–137
Romeo S, Pennacchio LA, Fu Y, Boerwinkle E, Tybjaerg-Hansen A, Hobbs HH, Cohen JC (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 39(4):513–516
Romeo S, Yin W, Kozlitina J, Pennacchio LA, Boerwinkle E, Hobbs HH, Cohen JC (2009) Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 119(1):70–79
Tennessen JA, Bigham AW, O’Connor TD, Fu W, Kenny EE, Gravel S, McGee S, Do R, Liu X, Jun G et al (2012) Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337(6090):64–69
Victor RG, Haley RW, Willett DL, Peshock RM, Vaeth PC, Leonard D, Basit M, Cooper RS, Iannacchione VG, Visscher WA et al (2004) The Dallas Heart Study: a population-based probability sample for the multidisciplinary study of ethnic differences in cardiovascular health. Am J Cardiol 93(12):1473–1480
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89(1):82–93
Acknowledgments
This work was supported by National Institutes of Health grants HL102926 and MD005964.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this chapter
Cite this chapter
Liu, D.J., Leal, S.M. (2015). Replicating Sequencing-Based Association Studies of Rare Variants. In: Zeggini, E., Morris, A. (eds) Assessing Rare Variation in Complex Traits. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2824-8_14
Download citation
DOI: https://doi.org/10.1007/978-1-4939-2824-8_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2823-1
Online ISBN: 978-1-4939-2824-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)