Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure

  • Jong Wha J. Joo
  • Eun Yong Kang
  • Elin Org
  • Nick Furlotte
  • Brian Parks
  • Aldons J. Lusis
  • Eleazar EskinEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)


A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.


Linear Mixed Model Multivariate Normal Distribution Multiple Phenotype Yeast Dataset Hybrid Mouse Diversity Panel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., et al.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996)CrossRefGoogle Scholar
  2. 2.
    Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., et al.: Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999)CrossRefGoogle Scholar
  3. 3.
    Cervino, A.C., Li, G., Edwards, S., Zhu, J., Laurie, C., et al.: Integrating qtl and high-density snp analyses in mice to identify insig2 as a susceptibility gene for plasma cholesterol levels. Genomics 86, 505–17 (2005)CrossRefGoogle Scholar
  4. 4.
    Hillebrandt, S., Wasmuth, H.E., Weiskirchen, R., Hellerbrand, C., Keppeler, H., et al.: Complement factor 5 is a quantitative trait gene that modifies liver fibrogenesis in mice and humans. Nat. Genet. 37, 835–843 (2005)CrossRefGoogle Scholar
  5. 5.
    Wang, X., Korstanje, R., Higgins, D., Paigen, B.: Haplotype analysis in multiple crosses to identify a qtl gene. Genome. Res. 14, 1767–1772 (2004)CrossRefGoogle Scholar
  6. 6.
    O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., et al.: Multiphen: joint model of multiple phenotypes can increase discovery in gwas. PLoS One 7, e34861 (2012)CrossRefGoogle Scholar
  7. 7.
    Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000)CrossRefGoogle Scholar
  8. 8.
    Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)CrossRefGoogle Scholar
  9. 9.
    Nievergelt, C.M., Libiger, O., Schork, N.J.: Generalized analysis of molecular variance. PLoS Genet. 3, e51 (2007)CrossRefGoogle Scholar
  10. 10.
    Zapala, M.A., Schork, N.J.: Statistical properties of multivariate distance matrix regression for high-dimensional data analysis. Front Genet. 3, 190 (2012)CrossRefGoogle Scholar
  11. 11.
    Wessel, J., Schork, N.J.: Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79, 792–806 (2006)CrossRefGoogle Scholar
  12. 12.
    Kittles, R.A., Chen, W., Panguluri, R.K., Ahaghotu, C., Jackson, A., et al.: Cyp3a4-v and prostate cancer in african americans: causal or confounding association because of population stratification? Hum. Genet. 110, 553–560 (2002)CrossRefGoogle Scholar
  13. 13.
    Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., et al.: Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004)CrossRefGoogle Scholar
  14. 14.
    Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004)CrossRefGoogle Scholar
  15. 15.
    Campbell, C.D., Ogburn, E.L., Lunetta, K.L., Lyon, H.N., Freedman, M.L., et al.: Demonstrating stratification in a european american population. Nat. Genet. 37, 868–872 (2005)CrossRefGoogle Scholar
  16. 16.
    Helgason, A., Yngvadttir, B., Hrafnkelsson, B., Gulcher, J., Stefnsson, K.: An icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005)Google Scholar
  17. 17.
    Reiner, A.P., Ziv, E., Lind, D.L., Nievergelt, C.M., Schork, N.J., et al.: Population structure, admixture, and aging-related phenotypes in african american adults: the cardiovascular health study. Am. J. Hum. Genet. 76, 463–477 (2005)CrossRefGoogle Scholar
  18. 18.
    Voight, B.F., Pritchard, J.K.: Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005)CrossRefGoogle Scholar
  19. 19.
    Berger, M., Stassen, H.H., Khler, K., Krane, V., Mnks, D., et al.: Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14, 236–244 (2006)CrossRefGoogle Scholar
  20. 20.
    Seldin, M.F., Shigeta, R., Villoslada, P., Selmi, C., Tuomilehto, J., et al.: European population substructure: clustering of northern and southern populations. PLoS Genet. 2, e143 (2006)CrossRefGoogle Scholar
  21. 21.
    Foll, M., Gaggiotti, O.: Identifying the environmental factors that determine the genetic structure of populations. Genetics 174, 875–91 (2006)CrossRefGoogle Scholar
  22. 22.
    Flint, J., Eskin, E.: Genome-wide association studies in mice. Nat. Rev. Genet. 13, 807–817 (2012)CrossRefGoogle Scholar
  23. 23.
    Zhou, X., Stephens, M.: Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014)CrossRefGoogle Scholar
  24. 24.
    Korte, A., Vilhjlmsson, B.J., Segura, V., Platt, A., Long, Q., et al.: A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012)CrossRefGoogle Scholar
  25. 25.
    Kang, H.M., Ye, C., Eskin, E.: Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008)CrossRefGoogle Scholar
  26. 26.
    Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y.Y., et al.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010)CrossRefGoogle Scholar
  27. 27.
    Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., et al.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011)CrossRefGoogle Scholar
  28. 28.
    Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M., Aulchenko, Y.S.: Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012)CrossRefGoogle Scholar
  29. 29.
    Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012)CrossRefGoogle Scholar
  30. 30.
    Segura, V., Vilhjlmsson, B.J., Platt, A., Korte, A., Seren, U., et al.: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012)CrossRefGoogle Scholar
  31. 31.
    Joo, J.W.J., Sul, J.H., Han, B., Ye, C., Eskin, E.: Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome. Biol. 15, r61 (2014)CrossRefGoogle Scholar
  32. 32.
    Bennett, B.J., Farber, C.R., Orozco, L., Kang, H.M., Ghazalpour, A., et al.: A high-resolution association mapping panel for the dissection of complex traits in mice. Genome. Res. 20, 281–290 (2010)CrossRefGoogle Scholar
  33. 33.
    Michaelson, J.J., Loguercio, S., Beyer, A.: Detection and interpretation of expression quantitative trait loci (eqtl). Methods 48, 265–276 (2009)CrossRefGoogle Scholar
  34. 34.
    Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M., Bedalov, A., et al.: Genetic basis of proteome variation in yeast. Nat. Genet. 39, 1369–1375 (2007)CrossRefGoogle Scholar
  35. 35.
    Perlstein, E.O., Ruderfer, D.M., Roberts, D.C., Schreiber, S.L., Kruglyak, L.: Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nat. Genet. 39, 496–502 (2007)CrossRefGoogle Scholar
  36. 36.
    Devlin, B., Roeder, K., Wasserman, L.: Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001)CrossRefzbMATHGoogle Scholar
  37. 37.
    Ley, R.E., Bckhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., et al.: Obesity alters gut microbial ecology. Proc. Natl. Acad. Sci. USA 102, 11070–11075 (2005)CrossRefGoogle Scholar
  38. 38.
    Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrm, G., Behre, C.J., et al.: Gut metagenome in european women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)CrossRefGoogle Scholar
  39. 39.
    Parks, B.W., Nam, E., Org, E., Kostem, E., Norheim, F., et al.: Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013)CrossRefGoogle Scholar
  40. 40.
    Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966)CrossRefzbMATHMathSciNetGoogle Scholar
  41. 41.
    McArdle, B.H., Anderson, M.J.: Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82, 290–297 (2001)CrossRefGoogle Scholar
  42. 42.
    Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecological monographs 27, 325–349 (1957)CrossRefGoogle Scholar
  43. 43.
    Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102, 1572–1577 (2005)CrossRefGoogle Scholar
  44. 44.
    Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., et al.: Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10, 57–59 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Jong Wha J. Joo
    • 1
  • Eun Yong Kang
    • 2
  • Elin Org
    • 3
  • Nick Furlotte
    • 2
  • Brian Parks
    • 3
  • Aldons J. Lusis
    • 3
    • 4
    • 5
  • Eleazar Eskin
    • 1
    • 2
    • 5
    Email author
  1. 1.Bioinformatics IDPUniversity of CaliforniaLos AngelesUSA
  2. 2.Computer Science DepartmentUniversity of CaliforniaLos AngelesUSA
  3. 3.Department of MedicineUniversity of CaliforniaLos AngelesUSA
  4. 4.Department of Microbiology, Immunology and Molecular GeneticsUniversity of CaliforniaLos AngelesUSA
  5. 5.Department of Human GeneticsUniversity of CaliforniaLos AngelesUSA

Personalised recommendations