Abstract
The analysis of genome-wide association studies (GWAS) poses statistical hurdles that have to be handled efficiently in order for the study to be successful. The two largest impediments in the analysis phase of the study are the multiple comparisons problem and maintaining robustness against confounding due to population admixture and stratification. For quantitative traits in family-based designs, Van Steen (1) proposed a two-stage testing strategy that can be considered a hybrid approach between family-based and population-based analysis. By including the population-based component into the family-based analysis, the Van Steen algorithm maximizes the statistical power, while at the same time, maintains the original robustness of family-based association tests (FBATs) (2–4). The Van Steen approach consists of two statistically independent steps, a screening step and a testing step. For all genotyped single nucleotide polymorphisms (SNPs), the screening step examines the evidence for association at a population-based level. Based on support for a potential genetic association from the screening step, the SNPs are prioritized for testing in the next step, where they are analyzed with a FBAT (3). By exploiting population-based information in the screening step that is not utilized in family-based association testing step, the two steps are statistically independent. Therefore, the use of the population-based data for the purposes of screening does not bias the FBAT statistic calculated in the testing step. Depending on the trait type and the ascertainment conditions, Van Steen-type testing strategies can achieve statistical power levels that are comparable to those of population-based studies with the same number of probands. In this chapter, we review the original Van Steen algorithm, its numerous extensions, and discuss its advantages and disadvantages.
Key words
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Van Steen, K., McQueen, M., Herbert, A., Raby, B., Lyon, H., DeMeo, D., Murphy, A., Su, J., Datta, S., Rosenow, C., et al. (2005). Genomic screening and replication using the same data set infamily-based association testing. Nature Genetics, 37, 683–691.
Spielman, R., McGinnis, R., and Ewens, W. (1993). Transmisson test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDm). American Journal of Human Genetics, 52, 506–516.
Laird, N., Horvath, S., and Xu, X. (2000). Implementing a unified approach to family-based tests of association. Genetic Epidemiology, 19, S36.
Laird, N. and Lange, C. (2006). Family-based designs in the age of large-scale gene-association studies. Nature Review Genetics, 7(5), 385–94.
The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature, 427, 1299–1320.
The International HapMap Consortium. (2007). The international hapmap consortium: A second generation human haplotype map of over 3.1 million snps. Nature, 449, 851–861.
Matsuzaki, H., Dong, S., Loi, H., Di, X., Liu, G., Hubbell, E., Law, J., Berntsen, T., Chadha, M., Hui, H., et al. (2004). Genotyping over 100, 000 snps on a pair of oligonucleotide arrays. Nature Methods, 11, 109–111.
Di, X., Matsuzaki, H., Webster, T. A., Hubbell, E., Liu, G., Dong, S., Bartell, D., Huang, J., Chiles, R., Yang, G., et al. (2005). Dynamic model based algorithms for screening and genotyping over 100 k snps on oligonucleotide microarrays. Bioinformatics, 21, 1958–1963.
Gunderson, K., Kuhn, K., Steemers, F., Ng, P., Murray, S., and Shen, R. (2006). Whole-genome genotyping of haplotype tag single nucleotide polymorphisms. Pharmacogenomics, 7, 641–648.
Wadma, M. (2006). The chips are down. Nature Digest, 444, 256–257.
Klein, R. J., Zeiss, C., Chew, E. Y., Tsai, J. Y., Sackler, R. S., Haynes, C., Henning, A. K., Sangiovanni, J. P., Mane, S. M., Mayne, S. T., et al. (2005). Complement factor h polymorphism in age-related macular degeneration. Science, 308, 385–389.
Herbert, A., Gerry, N., McQueen, M., Heid, I., Pfeufer, A., Illig, T., Wichmann, E.-H., Meitinger, T., Hunter, D., Hu, F., et al. (2006). Genetic variation near INSIG2 is a common determinant of obesity in western europeans and african americans. Science, 312, 279–283.
Zeggini, E., Weedon, M. N., Lindgren, C. M., Frayling, T. M., Elliott, K. S., Lango, H., Timpson, N. J., Perry, J. R., Rayner, N. W., Freathy, R. M., et al. (2007). Replication of genome-wide association signals in uk samples reveals risk loci for type 2 diabetes. Science, 316, 1336–1341.
Wellcome Trust Case Control Consortium. (2007). Genome-wide association study of 14, 000 cases of seven common diseases and 3, 000 shared controls. Nature, 447, 661–78.
Easton, D. F., Pooley, K. A., Dunning, A. M., Pharoah, P. D., Thompson, D., Ballinger, D. G., Struewing, J. P., Morrison, J., Field, H., Luben, R., et al. (2007). Genome-wide association study identifies novel breast cancer susceptibility loci. Nature, 447, 1087–1093.
Buch, S., Schafmayer, C., Volzke, H., Becker, C., Franke, A., von Eller-Eberstein, H., Kluck, C., Bassmann, I., Brosch, M., Lammert, F., et al. (2007). A genome-wide association scan identifies the hepatic cholesterol transporter abcg8 as a susceptibility factor for human gallstone disease. Nature Genetics, 39, 995–999.
Bierut, L. J., Madden, P. A., Breslau, N., Johnson, E. O., Hatsukami, D., Pomerleau, O. F., Swan, G. E., Rutter, J., Bertelsen, S., Fox, L., et al. (2007). Novel genes identified in a high-density genome wide association study for nicotine dependence. Human Molecular Genetics, 16, 24–35.
Zanke, B. W., Greenwood, C. M., Rangrej, J., Kustra, R., Tenesa, A., Farrington, S. M., Prendergast, J., Olschwang, S., Chiang, T., Crowdy, E., et al. (2007). Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature Genetics, 39, 989–994.
Yeager, M., Orr, N., Hayes, R. B., Jacobs, K. B., Kraft, P., Wacholder, S., Minichiello, M. J., Fearnhead, P., Yu, K., Chatterjee, N., et al. (2007). Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genetics, 39, 645–649.
Winkelmann, J., Schormair, B., Lichtner, P., Ripke, S., Xiong, L., Jalilzadeh, S., Fulda, S., Putz, B., Eckstein, G., Hauk, S., et al. (2007). Genome-wide association study of restless legs syndrome identifies common variants in three genomic regions. Nature Genetics, 39, 1000–1006.
Sladek, R., Rocheleau, G., Rung, J., Dina, C., Shen, L., Serre, D., Boutin, P., Vincent, D., Belisle, A., Hadjadj, S., et al. (2007). A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature, 445, 881–885.
Frayling, T., Timpson, N., Weedon, M., Zeggini, E., Freathy, R., Lindgren, C., Perry, J., Elliott, K., Lango, H., Rayner, N., et al. (2007). A Common Variant in the FTO Gene Is Associated with Body Mass Index and Predisposes to Childhood and Adult Obesity. Science, 316, 889.
Saxena, R., Voight, B., Lyssenko, V., Burtt, N., de Bakker, P., Chen, H., Roix, J., Kathiresan, S., Hirschhorn, J., Daly, M., et al. (2007). Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels. Science, 316, 1331–1336.
Scott, L., Mohlke, K., Bonnycastle, L., Willer, C., Li, Y., Duren, W., Erdos, M., Stringham, H., Chines, P., Jackson, A., et al. (2007). A Genome-Wide Association Study of Type 2 Diabetes in Finns Detects Multiple Susceptibility Variants. Science, 316, 1341.
Lettre, G., Jackson, A., Gieger, C., Schumacher, F., Berndt, S., Sanna, S., Eyheramendy, S., Voight, B., Butler, J., Guiducci, C., et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nature, 200, 8.
Neale, B., Lasky-Su, J., Anney, R., Franke, B., Zhou, K., Maller, J., Vasquez, A., Asherson, P., Chen, W., Banaschewski, T., et al. (2008). Genome-wide association scan of attention deficit hyperactivity disorder. American Journal Medical Genetics B Neuropsychiatric Genetics, 147, 1377–1344.
Lasky-Su, J., Anney, R., Neale, B., Franke, B., Zhou, K., Maller, J., Vasquez, A., Chen, W., Asherson, P., Buitelaar, J., et al. (2008). Genome-wide association scan of the time to onset of attention deficit hyperactivity disorder. American Journal Medical Genetics B Neuropsychiatric Genetics, 147, 1355–1358.
Kathiresan, S., Willer, C., Peloso, G., Demissie, S., Musunuru, K., Schadt, E., Kaplan, L., Bennett, D., Li, Y., Tanaka, T., et al. (2009). Common variants at 30 loci contribute to polygenic dyslipidemia. Nature Genetics, 41, 56–65.
Lasky-Su, J., Lyon, H., Emilsson, V., Heid, I., Molony, C., Raby, B., Lazarus, R., Klanderman, B., Soto-Quiros, M., Avila, L., et al. (2008). On the Replication of Genetic Associations: Timing Can Be Everything! The American Journal of Human Genetics, 82, 849–858.
Lasky-Su, J., Neale, B., Franke, B., Anney, R., Zhou, K., Maller, J., Vasquez, A., Chen, W., Asherson, P., Buitelaar, J., et al. (2008). Genome-wide association scan of quantitative traits for attention deficit hyperactivity disorder identifies novel associations and confirms candidate gene associations. American Journal Medical Genetics B Neuropsychiatric Genetics, 147, 1345–1354.
Bertram, L., Lange, C., Mullin, K., Parkinson, M., Hsiao, M., Hogan, M., Schjeide, B., Hooli, B., DiVito, J., Ionita, I., et al. (2008). Genome-wide Association Analysis Reveals Putative Alzheimer’s Disease Susceptibility Loci in Addition to APOE. American Journal of Human Genetics, 83, 623–632.
Satagopan, J. and Elston, R. (2003). Optimal two-stage genotyping in population-based association studies. Genetic Epidemiology, 25, 149–157.
Satagopan, J., Venkatraman, E., and Begg, C. (2004). Two-stage designs for gene-disease association studies with sample size contraints. Biometrics, 60, 589–597.
Satagopan, J., Verbel, D., Venkatraman, E., Offit, K., and Begg, C. (2004). Two-stage designs for gene-disease association studies. Biometrics, 58, 163–170.
Thomas, D., Xie, R., and Gebregziabher, M. (2004). Two-stage sampling designs for gene association studies. Genetic Epidemiology, 27, 401–414.
Hirschhorn, J. and Daly, M. (2005). Genome-wide association studies for common diseases and complex traits. Nature Review Genetics, 6, 95–108.
Evangelou, E., Maraganore, D., and Ioannidis, J. (2007). Meta-analysis in genome-wide association datasets: Strategies and application in parkinson disease. PLoS ONE, 2, e196.
Ioannidis, J. P., Patsopoulos, N. A., and Evangelou, E. (2007). Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE, 2, e841.
Scott, L. J., Mohlke, K. L., Bonnycastle, L. L., Willer, C. J., Li, Y., Duren, W. L., Erdos, M. R., Stringham, H. M., Chines, P. S., Jackson, A. U., et al. (2007). A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants. Science, 316, 1341–1345.
Saxena, R., Voight, B. F., Lyssenko, V., Burtt, N. P., de Bakker, P. I., Chen, H., Roix, J. J., Kathiresan, S., Hirschhorn, J. N., Daly, M. J., et al. (2007). Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science, 316, 1331–1336.
Spielman, R. and Ewens, W. (1998). A sibship test for linkage in the presence of association. American Journal of Human Genetics, 62, 450–458.
Martin, E., Bass, M., and Kaplan, N. (2001). Correcting for a potential bias in the pedigree disequilibrium test. American Journal of Human Genetics, 68, 1065–1067.
Monks, S. and Kaplan, N. (2000). Removing the sampling restrictions from family-based tests of association for a quantitative-trait locus. American Journal Human Genetics, 66, 576–592.
Chen, W. and Abecasis, G. (2007). Family-based association tests for genomewide association scans. American Journal of Human Genetics, 81, 913–926.
Aulchenko, Y., de Koning, D., and Haley, C. (2007). Genomewide rapid association using mixed model and regression: A fast and simple method for genomewide pedigree-based quantitative trait loci association analysis. Genetics, 177, 577.
Macgregor, S. (2008). Optimal two-stage testing for family-based genome-wide association studies. American Journal of Human Genetics, 82, 797–799.
Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics, 55, 997–1004.
Bacanu, S., Devlin, B., and Roeder, K. (2000). The power of genomic control. American Journal of Human Genetics, 66, 1933–1944.
Devlin, B., Roeder, K., and Wasserman, L. (2001). Genomic control, a new approach to genetic-based association studies. Theoretical Population Biology, 60, 155–166.
Price, A., Patterson, N., Plenge, R., Weinblatt, M., Shadick, N., and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38, 904–909.
Ionita-Laza, I., McQueen, M., Laird, N., and Lange, C. (2007). Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100 k scan. American Journal of Human Genetics, 81, 607–14.
Feng, T., Zhang, S., and Sha, Q. (2007). Two-stage association tests for genome-wide association studies based on family data with arbitrary family structure. European Journal of Human Genetics, 15, 1169–1175.
Murphy, A., Weiss, S., and Lange, C. (2008). Screening and replication using the same data set: Testing strategies for family-based studies in which All probands are affected. PLoS Genetics, 41(9), e1000197
Lange, C., DeMeo, D., Silverman, E., Weiss, S., and Laird, N. (2003). Using the noninformative families in family-based association tests: A powerful new testing strategy. American Journal of Human Genetics, 79, 801–811.
Lange, C., Lyon, H., DeMeo, D., Raby, B., Silverman, E., and Weiss, S. (2003). A new powerful non-parametric two-stage approach for testing multiple phenotypes in family-based association studies. Human Heredity, 56, 10–17.
Jiang, H., Harrington, D., Raby, B., Bertram, L., Blacker, D., Weiss, S., and C., L. (2006). Family-based association test for time-to-onset data with time-dependent differences between the hazard functions. Genetic Epidemiology, 30(2), 124–132.
Degnan, J., Lasky-Su, J., Raby, B., Xu, M., Molony, C., Schadt, E., and Lange, C. (2008). Genomics and genome-wide association studies: An integrative approach to expression QTL mapping. Genomics, 92, 129–133.
Rabinowitz, D. and Laird, N. (2000). A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Humman Heredity, 50, 211–223.
Clayton, D. and Jones, H. (1999). Transmission/disequilibrium tests for extended marker haplotypes. American Journal of Human Genetics, 65, 1161–1169.
Lunetta, K., Faraone, S., Biederman, J., and Laird, N. (2000). Family-based tests of association and linkage that use unaffected sibs, covariates, and interactions. American Journal of Human Genetics, 66, 605–614.
Whittaker, J. and Lewis, C. (1998). Power comparisons of the transmission/disequilibrium test and sibtransmission/disequilibrium-test statistics. American Journal of Human Genetics, 65,578–580.
Lange, C., DeMeo, D., and Laird, N. (2002). Power and design considerations for a general class of family-based association tests: Quantitative traits. American Journal of Human Genetics, 71, 1330–1341.
Lange, C. and Laird, N. (2002). On a general class of conditional tests for family-based association studies in genetics: the asymptotic distribution, the conditional power and optimality considerations. Genetic Epidemiology, 23, 165–180.
Mokliatchouk, O., Blacker, D., and Rabinowitz, D. (2001). Association tests for traits with variable age at onset. Human Heredity, 51, 46–53.
Horvath, S., Xu, X., and Laird, N. (2001). The family based association test method: strategies for studying general genotype-phenotype associations. European Journal of Human Genetics, 9, 301–306.
Lange, C., Blacker, D., and Laird, N. (2004). Family-based association tests for survival and times-to-onset analysis. Statistics in Medicine, 23, 179–189.
Lange, C., Silverman, E., Xu, X., Weiss, S., and Laird, N. (2003a). A multivariate family-based association test using generalized estimating equations: {FBAT-GEE}. Biostatistics, 4, 195–206.
Lange, C., Van Steen, K., Andrew, T., Lyon, H., DeMeo, D., Murphy, A., Silverman, E., A, M., Weiss, S., and Laird, N. (2004). A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Statistical Applications in Genetics and Molecular Biology: Vol. 3: No. 1, Article 17. http://www.bepress.com/sagmb/vol3/iss1/art17.
Murphy, A., Blacker, D., and Lange, C. (2004). Imputing missing phenotypes: A new fbat-statistic. Statistical Modelling, 4, 96–100.
Murphy, A., Van Steen, K., and Lange, C. (2004). On missing phenotype data in multivariate family based association tests: imputation strategies based on the em-algorithm, the da-algorithm and the conditional mean model. Far East Journal of Theoretical Statistics, 13, 175–188.
Schaid, D. and Sommer, S. (1994). Comparison of statistics for candidate-gene association studies using cases and parents. American Journal of Human Genetics, 55, 402–409.
Fulker, D., Cherny, S., Sham, P., and Hewit, J. (1999). Combined linkage and association sib-pair analysis for quantitative traits. Encyclopedia of Human Genetics and Genetic Epidemiology, 64, 259–267.
Lange, C., DeMeo, D., Silverman, E., Weiss, S., and Laird, N. (2004). PBAT: tools for family-based association studies. American Journal of Human Genetics, 74, 367–369.
Van Steen, K. and Lange, C. (2005). PBAT: a comprehensive software package for genome-wide association analysis of complex family based studies. Human Genomics, 2, 67–69.
Hoffmann, T. and Lange, C. (2006). P2BAT: a massive parallel implementation of pbat for genome-wide association studies in R. Bioinformatics., 22(24), 3103–3105.
McQueen, M., Weiss, S., Laird, N., and Lange, C. (2007). On the parsing of statistical information in family-based association testing. Nature Genetics, 39, 281–282.
Rosskopf, D., Bornhorst, A., Rimmbach, C., Schwahn, C., Kayser, A., Kruger, A., Tessmann, G., Geissler, I., Kroemer, H., and Volzke, H. (2007). Comment on “a common genetic variant is associated with adult and childhood obesity”. Science, 315, 187.
Hall, D., Rahman, T., Avery, P., and Keavney, B. (2006). INSIG-2 promoter polymorphism and obesity related phenotypes: association study in 1428 members of 248 families. BMC Medical Genetics, 7, 83.
Dina, C., Meyre, D., Samson, C., Tichet, J., Marre, M., Jouret, B., Charles, M., Balkau, B., and Froguel, P. (2007). Comment on “a common genetic variant is associated with adult and childhood obesity”. Science, 315, 187.
Loos, R., Barroso, I., O’Rahilly, S., and Wareham, N. (2007). Comment on “a common genetic variant is associated with adult and childhood obesity”. Science, 315, 187.
Lyon, H., Emilsson, V., Hinney, A., Heid, I., Lasky-Su, J., Zhu, X., Thorleifsson, G., Gunnarsdottir, S., Walters, G., Thorsteinsdottir, U., et al. (2007). The association of a SNP upstream of INSIG2 with body mass index is reproduced in several but not all cohorts. PLoS Genetics, 3, e61.
Smith, A., Cooper, J., Li, L., and Humphries, S. (2007). INSIG2 gene polymorphism is not associated with obesity in caucasian, afro-caribbean and indian subjects. International Journal of Obesity, 31, 1753–1755.
Kumar, J., Sunkishala, R., Karthikeyan, G., and Sengupta, S. (2007). The common genetic variant upstream of INSIG2 gene is not associated with obesity in indian population. Clinical Genetics, 71, 415–418.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Murphy, A., T. Weiss, S., Lange, C. (2010). Two-Stage Testing Strategies for Genome-Wide Association Studies in Family-Based Designs. In: Bang, H., Zhou, X., van Epps, H., Mazumdar, M. (eds) Statistical Methods in Molecular Biology. Methods in Molecular Biology, vol 620. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-580-4_17
Download citation
DOI: https://doi.org/10.1007/978-1-60761-580-4_17
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-578-1
Online ISBN: 978-1-60761-580-4
eBook Packages: Springer Protocols