Skip to main content

Statistical Analysis of GWAS

  • Chapter
  • First Online:
Phenotypes and Genotypes

Part of the book series: Computational Biology ((COBO,volume 18))

  • 2464 Accesses

Abstract

This chapter discusses the statistical analysis of genome-wide association studies. After briefly alluding to the topics of genotype calling and imputation of missing data, we will be mainly concerned with downstream analysis of association, where the genotypes of each individual at each SNP have been established. The classical approach of single marker tests combined with multiple testing correction is contrasted with different strategies of model selection, which tend to perform much better in terms of correctly identifying causal SNPs in the case of complex traits. Specific methods of handling rare SNPs, as well as population stratification, are discussed. The analysis of admixture mapping and gene–gene interactions are amongst the more advanced topics also considered here.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Affymetrix, Inc.: BRLMM: an Improved Genotype Calling Method for the GeneChip Human Mapping 500K Array Set. http://www.affymetrix.com/support/technical/whitepapers/brlmm_whitepaper.pdf (2006)

  2. Alexander, D.H., Lange, K.: Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinform. 12, 246 (2011)

    Article  Google Scholar 

  3. Alexander, D., Novembre, J., Lange, K.: Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)

    Article  Google Scholar 

  4. Andrew, A.S., Nelson, H.H., Kelsey, K.T., et al.: Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. Carcinogenesis 27(5), 1030–1037 (2006)

    Article  Google Scholar 

  5. Asimit, J., Zeggini, E.: Rare variant association analysis methods for complex traits. Annu. Rev. Genet. 44, 293–308 (2010)

    Article  Google Scholar 

  6. Armitage, P.: Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386 (1955)

    Article  Google Scholar 

  7. Balding, D.J.: A tutorial on statistical methods for population association studies. Nat. Rev. Gen. 7, 781–791 (2006)

    Article  Google Scholar 

  8. de Bakker, P.I., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37, 1217–1223 (2005)

    Article  Google Scholar 

  9. Bansal, V., Libiger, O., Torkamani, A., Schork, N.J.: Statistical analysis strategies for association studies involving rare variants. Nat. Rev. Genet. 11(11), 773–785 (2010)

    Article  Google Scholar 

  10. Barlow, R.E., Bartholomew, D.J., Bremner, J.M., Brunk, H.D.: Statistical Inference under Order Restrictions; the Theory and Application of Isotonic Regression. Wiley, New York (1972)

    Google Scholar 

  11. Barrett, J.C., Fry, B., Maller, J., Daly, M.J.: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005)

    Article  Google Scholar 

  12. Bazaraa, M., Shetty, C.: Nonlinear Programming: Theory and Algorithms. Wiley, New York (1979)

    MATH  Google Scholar 

  13. Beben, B., Visscher, P.M., McRae, A.F.: Family-based genome-wide association studies. Pharmacogenomics 20(2), 181–190 (2009)

    Google Scholar 

  14. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  15. Bogdan, M., Frommlet, F., Biecek, P., Cheng, R., Ghosh, J.K., Doerge, R.W.: Extending the modified Bayesian Information Criterion (mBIC) to dense markers and multiple interval mapping. Biometrics 64, 1162–1169 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  16. Bogdan, M., Żak-Szatkowska, M., Ghosh, J.K.: Selecting explanatory variables with the modified version of Bayesian Information Criterion. Qual. Reliab. Eng. Int. 24, 627–641 (2008)

    Article  Google Scholar 

  17. Browning, S.R.: Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008)

    Article  Google Scholar 

  18. Browning, B.L., Yu, Z.: Simultaneous genotype calling and haplotype phase inference improves genotype accuracy and reduces false positive associations for genome-wide association studies. Am. J. Hum. Genet. 85, 847–861 (2009)

    Article  Google Scholar 

  19. Browning, B.L., Browning, S.R.: A unified approach to genotype imputation and haplotype phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009)

    Article  Google Scholar 

  20. Cantor, R.M., Lange, K., Sinsheimer, J.S.: Prioritizing GWAS results: A review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86(1), 6–22 (2010)

    Article  Google Scholar 

  21. Carlson, C.S., Eberle, M.A., Rieder, M.J., Yi, Q., Kruglyak, L., Nickerson, D.A.: Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74(1), 106–120 (2004)

    Article  Google Scholar 

  22. Carvalho, B., Bengtsson, H., Speed, T.P., Irizarry, R.A.: Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. Biostatistics 8, 485–499 (2007)

    Article  MATH  Google Scholar 

  23. Carvalho, B.S., Irizarry, R.A.: A framework for oligonucleotide microarray preprocessing. Bioinformatics 26, 2363–2367 (2010)

    Article  Google Scholar 

  24. Chakraborty, R., Weiss, K.M.: Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc. Nat. Acad. Sci. 85(23), 9119–9123 (1988)

    Article  Google Scholar 

  25. Chen, C.C.M., Schwender, H., Keith, J., Nunkesser, R., Mengersen, K., Macrossan, P.: Methods for identifying SNP interactions: a review on variations of logic regression, random forest and Bayesian logistic regression. IEEE/ACM Trans. Comput. Biol. Bioinf. 8(6), 1580–1591 (2011)

    Article  Google Scholar 

  26. Chen, J., Chen, Z.: Extended Bayesian Information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  27. Chen, J., Chen, Z.: Extended BIC for small \(n\)-large-\(P\) sparse GLM. www.stat.nus.edu.sg/~stachenz/ChenChen.pdf (2010)

  28. Chen, J., Chen, Z.: Tournament screening cum EBIC for feature selection with high-dimensional feature spaces. Sci. China A: Math. 52(6), 1327–1341 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  29. Chen, L., Yu, G., Langefeld, C.D., et al.: Comparative analysis of methods for detecting interacting loci. BMC Genomics 12(1), 344 (2011)

    Article  Google Scholar 

  30. Chipman, H., George, E.I., McCulloch, R.E.: The practical implementation of Bayesian model selection (with discussion). In: Lahiri, P. (ed.) Model Selection, pp. 66–134. IMS, Beachwood, OH (2001)

    Google Scholar 

  31. Colditz, G.A., Hankinson, S.E.: The nurses’ health study: lifestyle and health among women. Nat. Rev. Cancer 5, 388–396 (2005)

    Article  Google Scholar 

  32. Consortium WTCCC: Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

    Google Scholar 

  33. Cordell, H.J.: Detecting gene-gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)

    Article  Google Scholar 

  34. Dai, H., Bhandary, M., Becker, M., Leeder, J.S., Gaedigk, R., Motsinger-Reif, A.A.: Global tests of p-values for multifactor dimensionality reduction models in selection of optimal number of target genes biodata mining 5(1), 1–17 (2012)

    Google Scholar 

  35. De, R., Verma, S.S., Holmes, M.V. et al.: Dissecting the obesity disease landscape: identifying gene-gene interactions that are highly associated with body mass index. In: 2014 8th International Conference on Systems Biology (ISB), 124–131. IEEE (2014)

    Google Scholar 

  36. de Bakker, P.I., Yelensky, R., Pe’er, I., Gabriel, S.B., Daly, M.J., Altshuler, D.: Efficiency and power in genetic association studies. Nat. Genet. 37(11), 1217–1223 (2005)

    Article  Google Scholar 

  37. Devlin, B., Roeder, K.: Genomic control for association studies. Biometrics 55, 997–1004 (1999)

    Article  MATH  Google Scholar 

  38. Di, X., Matsuzaki, H., Webster, T.A., Hubbell, E., Liu, G., Dong, S., Bartell, D., Huang, J., Chiles, R., Yang, G., Shen, M., Kulp, D., Kennedy, G.C., Mei, R., Jones, K.W., Cawley, S.: Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays. Bioinformatics 21, 1958–1963 (2005)

    Article  Google Scholar 

  39. Dolejsi, E., Bodenstorfer, B., Frommlet, F.: Analyzing genome-wide association studies with an FDR controlling modification of the Bayesian Information Criterion. PLoS One e103322 (2014)

    Google Scholar 

  40. Dudbridge, F., Gusnanto, A.: Estimation of significance thresholds for genomewide association scans. Genet. Epid. 32, 227–234 (2008)

    Article  Google Scholar 

  41. Eichler, E.E., et al.: Missing heritability and strategies for finding the underlying causes of complex disease. Nat. Rev. Genet. 11, 446–450 (2010)

    Article  Google Scholar 

  42. Emily, M., Mailund, T., Hein, J., Schauser, L., Schierup, M.H.: Using biological networks to search for interacting loci in genome-wide association studies. Eur. J. Hum. Genet. 17(10), 1231–1240 (2009)

    Article  Google Scholar 

  43. Fan, J., Lv, J.: Sure independence screening for ultrahigh dimensional feature space. J. R. Statist. Soc. B 70, 849–911 (2008)

    Article  MathSciNet  Google Scholar 

  44. Freidlin, B., Zheng, G., Li, Z., Gastwirth, J.L.: Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum. Hered. 53, 146–152 (2002)

    Article  Google Scholar 

  45. Friedman, N., Linial, M., Nachman, I., Pe’er, D.: Using Bayesian networks to analyze expression data. J. Comput. Biol. 7(3–4), 601–620 (2000)

    Article  Google Scholar 

  46. Frommlet, F.: Tag SNP selection based on clustering according to dominant sets found using replicator dynamics. Adv. Data Anal. Classif. 4, 65–83 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  47. Frommlet, F., Chakrabarti, A., Murawska, M., Bogdan, M.: Asymptotic Bayes optimality under sparsity of selection rules for general priors. arXiv:1005.4753 (2010)

  48. Frommlet, F., Ruhaltinger, F., Twarog, P., Bogdan, M.: Modified versions of Bayesian information criterion for genome-wide association studies. CSDA 56, 1038–1051 (2012)

    MathSciNet  Google Scholar 

  49. George, E.I., Foster, D.P.: Calibration and empirical Bayes variable selection. Biometrika 87, 731–747 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  50. Griffin, J.E., Brown, P.J.: Bayesian adaptive lasso with non-convex penalization. Technical Report, University of Kent (2007)

    Google Scholar 

  51. Gui, J., Moore, J.H., Williams, S.M., Andrews, P., Hillege, H.L., van der Harst, P., Navis, G., Van Gilst, W.H., Asselbergs, F.W., Gilbert-Diamond, D.: A simple and computationally efficient approach to multifactor dimensionality reduction analysis of gene-gene interactions for quantitative traits. PLoS One 8(6), e66545 (2013)

    Article  Google Scholar 

  52. Nature Consortium.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–862 (2007)

    Google Scholar 

  53. Han, F., Pan, W.: A data-adaptive sum test for disease association with multiple common or rare variants. Hum. Hered. 70(1), 42–54 (2010)

    Article  Google Scholar 

  54. Hansen, M.H., Kooperberg, C.: Spline adaptation in extended linear models (with discussion). Stat. Sci. 17, 2–51 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  55. He, Q., Lin, D.: A variable selection method for genome-wide association studies. Bioinformatics 27(1), 1–8 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  56. Hindorff, L.A., Junkins, H.A., Hall, P.N., Mehta, J.P., Manolio, T.A.: A Catalog of Published Genome-Wide Association Studies. www.genome.gov/gwastudies

  57. Hirschhorn, J.N., Daly, M.J.: Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6(2), 95–108 (2005)

    Article  Google Scholar 

  58. Hoggart, C.J., Whittaker, J.C., De Iorio, M., Balding, D.J.: Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLOS Genet. 4(7), e1000130 (2008). doi:10.1371/journal.pgen.1000130

    Article  Google Scholar 

  59. Hothorn, L.A., Hothorn, T.: Order-restricted scores test for the evaluation of population-based case-control studies when the genetic model is unknown. Biometrical J. 51(4), 659–669 (2009)

    Article  MathSciNet  Google Scholar 

  60. Iyengar, S.K., Elston, R.C.: The genetic basis of complex traits: rare variants or “common gene, common disease”? Methods Mol. Biol. 376, 71–84 (2007)

    Article  Google Scholar 

  61. Kang, H.M., Zaitlen, N.A., Wade, C.M., Kirby, A., Heckerman, D., Daly, M.J., Eskin, E.: Efficient control of population structure in model organism association mapping. Genetics 178(3), 1709–1723 (2008)

    Article  Google Scholar 

  62. Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y., Freimer, N.B., Sabatti C., Eskin, E.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42(4), 348–354 (2010)

    Google Scholar 

  63. Kennedy, G.C., Matsuzaki, H., Dong, S., Liu, W.M., Huang, J., Liu, G., Su, X., Cao, M., Chen, W., Zhang, J., Liu, W., Yang, G., Di, X., Ryder, T., He, Z., Surti, U., Phillips, M.S., Boyce-Jacino, M.T., Fodor, S.P., Jones, K.W.: Large-scale genotyping of complex DNA. Nat. Biotechnol. 21, 1233–1237 (2003)

    Article  Google Scholar 

  64. Kooperberg, C., LeBlanc, M., Obenchain, V.: Risk prediction using genome-wide association studies. Genet. Epidem. 34, 643–652 (2010)

    Article  Google Scholar 

  65. Kooperberg, C., Ruczinski, I.: Identifying interacting SNPs using Monte Carlo logic regression. Genet. Epidemiol. 28(2), 157–170 (2005)

    Article  Google Scholar 

  66. Koren, M., Kimmel, G., Ben-Asher, E., Gal, I., Papa, M.Z., Beckmann, J.S., Lancet, D., Shamir, R., Friedman, E.: ATM haplotypes and breast cancer risk in Jewish high-risk women. Br. J. Cancer. 94(10), 1537–1543 (2006)

    Article  Google Scholar 

  67. Lao, O., et al.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Curr. Biol. 18(16), 1241–1248 (2008)

    Article  MathSciNet  Google Scholar 

  68. Laurie, C.L., et al.: Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34, 591–602 (2010)

    Article  Google Scholar 

  69. Li, J., Das, K., Fu, G., Li, R., Wu, R.: The Bayesian Lasso for genome-wide association studies. Bioinformatics 27(4), 516–523 (2010)

    Article  Google Scholar 

  70. Li, B., Leal, S.M.: Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83(3), 311–321 (2008)

    Article  Google Scholar 

  71. Lin, S., Carvalho, B., Cutler, D.J., Arking, D.E., Chakravarti, A., Irizarry, R.A.: Validation and extension of an empirical Bayes method for SNP calling on affymetrix microarrays. Genome Biol. 9, R63 (2008)

    Google Scholar 

  72. Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., Heckerman, D.: FaST linear mixed models for genome-wide association studies. Nat. Methods 8(10), 833–835 (2011)

    Article  Google Scholar 

  73. Liu, W., Di, X., Yang, G., Matsuzaki, H., Huang, J., Mei, R., Ryder, T.B., Webster, T.A., Dong, S., Liu, G., Jones, K.W., Kennedy, G.C., Kulp, D.: Algorithms for large-scale genotyping microarrays. Bioinformatics 19, 2397–2403 (2003)

    Article  Google Scholar 

  74. Long, J.C.: The genetic structure of admixed populations. Genetics 127, 417–428 (1991)

    Google Scholar 

  75. Lou, X.Y., Chen, G.B., Yan, L., Ma, J.Z., Zhu, J., et al.: A generalized combinatorial approach for detecting gene-by-gene and gene-by-environment interactions with application to nicotine dependence. Am. J. Hum. Genet. 80, 1125–1137 (2007)

    Article  Google Scholar 

  76. Manolio, T.A., et al.: Finding the missing heritability of complex diseases. Nature 461(7265), 747–753 (2009)

    Article  Google Scholar 

  77. Marchini, J., Donnelly, P., Cardon, L.R.: Genome-wide strategies for detecting multiple loci that influence complex diseases. Nat. Genet. 37(4), 413–417 (2005)

    Article  Google Scholar 

  78. Marchini, J., Howie, B.: Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010)

    Article  Google Scholar 

  79. McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)

    Article  Google Scholar 

  80. McCarthy, M.I., Hirschhorn, J.N.: Genome-wide association studies: potential next steps on a genetic journey. Hum. Mol. Genet. 17, R156–R165 (2008)

    Article  Google Scholar 

  81. McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall/CRC, Boca Raton (1989)

    Book  MATH  Google Scholar 

  82. McKeigue, P.M.: Mapping genes underlying ethnic differences in disease risk by linkage disequilibrium in recently admixed populations. Am. J. Hum. Genet. 60(1), 188 (1997)

    Google Scholar 

  83. Meinshausen, N., Bhlmann, P.: Stability selection. JRSSB 72, 417–448 (2010)

    Article  MathSciNet  Google Scholar 

  84. Menozzi, P., Piazza, A., Cavalli-Sforza, L.: Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978)

    Article  Google Scholar 

  85. Miller, D.J., Zhang, Y., Yu, G.: An algorithm for learning maximum entropy probability models of disease risk that efficiently searches and sparingly encodes multilocus genomic interactions. Bioinformatics 25(19), 2478–2485 (2009)

    Article  Google Scholar 

  86. Moore, J.H.: The ubiquitous nature of epistasis in determining susceptibility to common human diseases. Hum. Hered. 56, 73–82 (2003)

    Article  Google Scholar 

  87. Moore, J.H., Gilbert, J.C., Tsai, C.T., Chiang, F.T., Holden, T., Barney, N., White, B.C.: A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. J. Theor. Biol. 241(2), 252–261 (2006)

    Article  MathSciNet  Google Scholar 

  88. Morgenthaler, S., Thilly, W.G.: A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat. Res. 615(1–2), 28–56 (2007)

    Article  Google Scholar 

  89. National Center for Biotechnology Information, United States National Library of Medicine. NCBI dbSNP build 144 for human. Summary Page. http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi?view+summary=view+summary&build_id=144. Accessed 26 Aug 2015

  90. Nelson, M.R., et al.: The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. 83, 347–358 (2008)

    Article  Google Scholar 

  91. Ouwehand, W.H.: The discovery of genes implicated in myocardial infarction. J. Thromb. Haemost. 7(Suppl 1), 305–307 (2009)

    Article  Google Scholar 

  92. Park, T., Casella, G.: The Bayesian Lasso. JASA 103, 681–686 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  93. Pattin, K.A., White, B.C., Barney, N., et al.: A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet. Epidemi. 33(1), 87–94 (2009)

    Article  Google Scholar 

  94. Pierce, J.R.: An Introduction to Information Theory: Symbols, Signals, and Noise. Dover, New York (1980)

    Google Scholar 

  95. Potkin, S.G., Turner, J.A., Guffanti, G., Lakatos, A., Torri, F., Keator, D.B., Macciardi, F.: Genome-wide strategies for discovering genetic influences on cognition and cognitive disorders: methodological considerations. Cogn. Neuropsychiatry 14(4/5), 391–418 (2009)

    Article  Google Scholar 

  96. Pritchard, J.K., Rosenberg, N.A.: Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999)

    Article  Google Scholar 

  97. Pritchard, J., Stephens, M., Donnelly, P.: Inference of population structure using multilocus genotype data. Genetics 155(2), 945 (2000)

    Google Scholar 

  98. Price, A.L., et al.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006)

    Article  Google Scholar 

  99. Price, A.L., Patterson, N., Yu, F., et al.: A genomewide admixture map for Latino populations. Am. J. Hum. Genet. 80(6), 1024–1036 (2007)

    Article  Google Scholar 

  100. Price, A.L., Tandon, A., Patterson, N., Barnes, K.C., Rafaels, N., Ruczinski, I., Beatty, T.H., Mathias, R., Reich, D., Myers, S.: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet. 5(6), e1000519 (2009)

    Article  Google Scholar 

  101. Price, A.L., Zaitlen, N.A., Reich, D., Patterson, N.: New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 11(7), 459–463 (2010)

    Article  Google Scholar 

  102. Purcell, S., Neale, B., Todd-Brown, K., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)

    Article  Google Scholar 

  103. Rabbee, N., Speed, T.P.: A genotype calling algorithm for affymetrix SNP arrays. Bioinformatics 22, 7–12 (2006)

    Article  Google Scholar 

  104. Redden, D.T., Divers, J., Vaughan, L.K., et al.: Regional admixture mapping and structured association testing: conceptual unification and an extensible general linear model. PLoS Genet. 2, e137 (2006)

    Article  Google Scholar 

  105. Reich, D.E., Goldstein, D.B.: Detecting association in a case-control study while correcting for population stratification. Genet. Epidemiol. 20, 4–16 (2001)

    Article  Google Scholar 

  106. Ritchie, M.E., Carvalho, B.S., Hetrick, K.N., Tavaré, S., Irizarry, R.A.: R/Bioconductor software for Illumina’s Infinium whole-genome genotyping BeadChips. Bioinformatics 25, 2621–2623 (2009)

    Article  Google Scholar 

  107. Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69(1), 138–147 (2001)

    Article  Google Scholar 

  108. Riveros, C., Vimieiro, R., Holliday, E.G.: Identification of Genome-Wide SNP-SNP and SNP-Clinical Boolean Interactions in Age-Related Macular Degeneration In Epistasis, 217–255. Springer, New York (2015)

    Google Scholar 

  109. Robertson, T., Wright, F.T., Dykstra, R.L.: Order Restricted Statistical Inference. Wiley, New York (1988)

    Google Scholar 

  110. Nature Genetics Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. 41(1), 35–46 (2009)

    Google Scholar 

  111. Sampson, J.N., Zhao, H.: Genotyping and inflated type I error rate in genome-wide association case/control studies. BMC Bioinform. 10, 68 (2009)

    Google Scholar 

  112. Sasieni, P.D.: From genotypes to genes: doubling the sample size. Biometrics 53, 1253–1261 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  113. Scheet, P., Stephens, M.: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006)

    Article  Google Scholar 

  114. Schwender, H., Ickstadt, K.: Identification of SNP interactions using logic regression. Biostatistics 9(1), 187–198 (2008)

    Article  MATH  Google Scholar 

  115. Schwender, H., Ruczinski, I., Ickstadt, K.: Testing SNPs and sets of SNPs for importance in association studies. Biostatistics (2010). doi:10.1093/biostatistics/kxq042

    Google Scholar 

  116. Segura, V., Vilhjalmsson, B.J., Platt, A., Korte, A., Seren, Ü., Long, Q., Nordborg, M.: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44(7), 825–830 (2012)

    Article  Google Scholar 

  117. Setakis, E., Stirnadel, H., Balding, D.J.: Logistic regression protects against population structure in genetic association studies. Genome Res. 16, 290–296 (2006)

    Article  Google Scholar 

  118. Spielman, R.S., McGinnis, R.E., Ewens, W.J.: Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am. J. Hum. Genet. 52(3), 506–516 (1993)

    Google Scholar 

  119. Stranger, B.E., Nica, A.C., Forrest, M.S., Dimas, A., Bird, C.P., Beazley, C., Ingle, C.E., Dunning, M., Flicek, P., Montgomery, S., Tavaré, S., Deloukas, P., Dermitzakis, E.T.: Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007)

    Article  Google Scholar 

  120. Szulc, P., Bogdan, M., Frommlet, F., Tang H.: Joint Genotype- and Ancestry-based Genome-wide Association Studies in Admixed Populations. Working Paper (2015)

    Google Scholar 

  121. Tang, H., Siegmund, D.O., Johnson, N.A., Romieu, I., London, S.J.: Joint testing of genotype and ancestry association in admixed families. Genet. Epidemiol. 34(8), 783–791 (2010)

    Article  Google Scholar 

  122. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  123. Via, M., Gignoux, C., Burchard, E.G.: The 1000 genomes project: new opportunities for research and social challenges. Genome Med. 2, 3 (2010)

    Article  Google Scholar 

  124. Wei, Z., Sun, W., Wang, K., Hakonarson, H.: Multiple testing in genome-wide association studies via hidden Markov models. Bioinformatics 25(21), 2802–2808 (2009)

    Article  Google Scholar 

  125. Wolf, B.J., Hill, E.G., Slate, E.H.: Logic forest: an ensemble classifier for discovering logical combinations of binary markers. Bioinformatics 26(17), 2183–2189 (2010)

    Article  Google Scholar 

  126. Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)

    Google Scholar 

  127. Yang, C., He, Z., Wan, X., Yang, Q., Xue, H., Yu, W.: SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 25(4), 504–511 (2009)

    Article  Google Scholar 

  128. Yang, J., et al.: Common SNPs explain a large proportion of heritability for human height. Nat. Genet. 42, 565–569 (2010)

    Article  Google Scholar 

  129. Yu, J., Pressoir, G., Briggs, W.H., Vroh Bi, I., Yamasaki, M., Doebley, J.F., McMullen, M.D., Gaut, B.S., Nielsen, D.M., Holland, J.B., Kresovich, S., Buckler, E.S.: A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat. Genet. 38(2), 203–208 (2006)

    Google Scholar 

  130. Żak-Szatkowska, M., Bogdan, M.: Modified versions of Bayesian information criterion for sparse generalized linear models. CSDA. In Press, Accepted Manuscript (2012)

    Google Scholar 

  131. Zehetmayer, S., Posch, M.: False discovery rate control in two-stage designs. BMC Bioinform. 613, 81 (2012). doi:10.1186/1471-2105-13-81

  132. Zhang, Y., Liu, J.S.: Bayesian inference of epistatic interactions in case-control studies. Nat. Genet. 39(9), 1167–1173 (2007)

    Article  Google Scholar 

  133. Zhao, J., Chen, Z.: A two-stage penalized logistic regression approach to case-control genome-wide association studies. www.stat.nus.edu.sg/~stachenz/MS091221PR.pdf (2010)

  134. Ziegler, A., König, I.R., Thompson, J.R.: Biostatistical aspects of genome-wide association studies. Biometrical J. 50(1), 8–28 (2008)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Florian Frommlet .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag London

About this chapter

Cite this chapter

Frommlet, F., Bogdan, M., Ramsey, D. (2016). Statistical Analysis of GWAS. In: Phenotypes and Genotypes. Computational Biology, vol 18. Springer, London. https://doi.org/10.1007/978-1-4471-5310-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5310-8_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5309-2

  • Online ISBN: 978-1-4471-5310-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics