Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure

Joo, Jong Wha J.; Kang, Eun Yong; Org, Elin; Furlotte, Nick; Parks, Brian; Lusis, Aldons J.; Eskin, Eleazar

doi:10.1007/978-3-319-16706-0_15

Jong Wha J. Joo⁵,
Eun Yong Kang⁶,
Elin Org⁷,
Nick Furlotte⁶,
Brian Parks⁷,
Aldons J. Lusis^7,8,9 &
…
Eleazar Eskin^5,6,9

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9029))

Included in the following conference series:

International Conference on Research in Computational Molecular Biology

2882 Accesses
1 Citations

Abstract

A typical GWAS tests correlation between a single phenotype and each genotype one at a time. However, it is often very useful to analyze many phenotypes simultaneously. For example, this may increase the power to detect variants by capturing unmeasured aspects of complex biological networks that a single phenotype might miss. There are several multivariate approaches that try to detect variants related to many phenotypes, but none of them consider population structure and each may result in a significant number of false positive identifications. Here, we introduce a new methodology, referred to as GAMMA, that could both simultaneously analyze many phenotypes as well as correct for population structure. In a simulated study, GAMMA accurately identifies true genetic effects without false positive identifications, while other methods either fail to detect true effects or result in many false positive identifications. We further apply our method to genetic studies of yeast and gut microbiome from mouse and show that GAMMA identifies several variants that are likely to have a true biological mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., et al.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680 (1996)
Article Google Scholar
Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., et al.: Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999 (1999)
Article Google Scholar
Cervino, A.C., Li, G., Edwards, S., Zhu, J., Laurie, C., et al.: Integrating qtl and high-density snp analyses in mice to identify insig2 as a susceptibility gene for plasma cholesterol levels. Genomics 86, 505–17 (2005)
Article Google Scholar
Hillebrandt, S., Wasmuth, H.E., Weiskirchen, R., Hellerbrand, C., Keppeler, H., et al.: Complement factor 5 is a quantitative trait gene that modifies liver fibrogenesis in mice and humans. Nat. Genet. 37, 835–843 (2005)
Article Google Scholar
Wang, X., Korstanje, R., Higgins, D., Paigen, B.: Haplotype analysis in multiple crosses to identify a qtl gene. Genome. Res. 14, 1767–1772 (2004)
Article Google Scholar
O’Reilly, P.F., Hoggart, C.J., Pomyen, Y., Calboli, F.C.F., Elliott, P., et al.: Multiphen: joint model of multiple phenotypes can increase discovery in gwas. PLoS One 7, e34861 (2012)
Article Google Scholar
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000)
Article Google Scholar
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)
Article Google Scholar
Nievergelt, C.M., Libiger, O., Schork, N.J.: Generalized analysis of molecular variance. PLoS Genet. 3, e51 (2007)
Article Google Scholar
Zapala, M.A., Schork, N.J.: Statistical properties of multivariate distance matrix regression for high-dimensional data analysis. Front Genet. 3, 190 (2012)
Article Google Scholar
Wessel, J., Schork, N.J.: Generalized genomic distance-based regression methodology for multilocus association analysis. Am. J. Hum. Genet. 79, 792–806 (2006)
Article Google Scholar
Kittles, R.A., Chen, W., Panguluri, R.K., Ahaghotu, C., Jackson, A., et al.: Cyp3a4-v and prostate cancer in african americans: causal or confounding association because of population stratification? Hum. Genet. 110, 553–560 (2002)
Article Google Scholar
Freedman, M.L., Reich, D., Penney, K.L., McDonald, G.J., Mignault, A.A., et al.: Assessing the impact of population stratification on genetic association studies. Nat. Genet. 36, 388–393 (2004)
Article Google Scholar
Marchini, J., Cardon, L.R., Phillips, M.S., Donnelly, P.: The effects of human population structure on large genetic association studies. Nat. Genet. 36, 512–517 (2004)
Article Google Scholar
Campbell, C.D., Ogburn, E.L., Lunetta, K.L., Lyon, H.N., Freedman, M.L., et al.: Demonstrating stratification in a european american population. Nat. Genet. 37, 868–872 (2005)
Article Google Scholar
Helgason, A., Yngvadttir, B., Hrafnkelsson, B., Gulcher, J., Stefnsson, K.: An icelandic example of the impact of population structure on association studies. Nat. Genet. 37, 90–95 (2005)
Google Scholar
Reiner, A.P., Ziv, E., Lind, D.L., Nievergelt, C.M., Schork, N.J., et al.: Population structure, admixture, and aging-related phenotypes in african american adults: the cardiovascular health study. Am. J. Hum. Genet. 76, 463–477 (2005)
Article Google Scholar
Voight, B.F., Pritchard, J.K.: Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005)
Article Google Scholar
Berger, M., Stassen, H.H., Khler, K., Krane, V., Mnks, D., et al.: Hidden population substructures in an apparently homogeneous population bias association studies. Eur. J. Hum. Genet. 14, 236–244 (2006)
Article Google Scholar
Seldin, M.F., Shigeta, R., Villoslada, P., Selmi, C., Tuomilehto, J., et al.: European population substructure: clustering of northern and southern populations. PLoS Genet. 2, e143 (2006)
Article Google Scholar
Foll, M., Gaggiotti, O.: Identifying the environmental factors that determine the genetic structure of populations. Genetics 174, 875–91 (2006)
Article Google Scholar
Flint, J., Eskin, E.: Genome-wide association studies in mice. Nat. Rev. Genet. 13, 807–817 (2012)
Article Google Scholar
Zhou, X., Stephens, M.: Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11, 407–409 (2014)
Article Google Scholar
Korte, A., Vilhjlmsson, B.J., Segura, V., Platt, A., Long, Q., et al.: A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012)
Article Google Scholar
Kang, H.M., Ye, C., Eskin, E.: Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics 180, 1909–1925 (2008)
Article Google Scholar
Kang, H.M., Sul, J.H., Service, S.K., Zaitlen, N.A., Kong, S.Y.Y., et al.: Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010)
Article Google Scholar
Lippert, C., Listgarten, J., Liu, Y., Kadie, C.M., Davidson, R.I., et al.: Fast linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011)
Article Google Scholar
Svishcheva, G.R., Axenovich, T.I., Belonogova, N.M., van Duijn, C.M., Aulchenko, Y.S.: Rapid variance components-based method for whole-genome association analysis. Nat. Genet. 44, 1166–1170 (2012)
Article Google Scholar
Zhou, X., Stephens, M.: Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012)
Article Google Scholar
Segura, V., Vilhjlmsson, B.J., Platt, A., Korte, A., Seren, U., et al.: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat. Genet. 44, 825–830 (2012)
Article Google Scholar
Joo, J.W.J., Sul, J.H., Han, B., Ye, C., Eskin, E.: Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies. Genome. Biol. 15, r61 (2014)
Article Google Scholar
Bennett, B.J., Farber, C.R., Orozco, L., Kang, H.M., Ghazalpour, A., et al.: A high-resolution association mapping panel for the dissection of complex traits in mice. Genome. Res. 20, 281–290 (2010)
Article Google Scholar
Michaelson, J.J., Loguercio, S., Beyer, A.: Detection and interpretation of expression quantitative trait loci (eqtl). Methods 48, 265–276 (2009)
Article Google Scholar
Foss, E.J., Radulovic, D., Shaffer, S.A., Ruderfer, D.M., Bedalov, A., et al.: Genetic basis of proteome variation in yeast. Nat. Genet. 39, 1369–1375 (2007)
Article Google Scholar
Perlstein, E.O., Ruderfer, D.M., Roberts, D.C., Schreiber, S.L., Kruglyak, L.: Genetic basis of individual differences in the response to small-molecule drugs in yeast. Nat. Genet. 39, 496–502 (2007)
Article Google Scholar
Devlin, B., Roeder, K., Wasserman, L.: Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001)
Article MATH Google Scholar
Ley, R.E., Bckhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., et al.: Obesity alters gut microbial ecology. Proc. Natl. Acad. Sci. USA 102, 11070–11075 (2005)
Article Google Scholar
Karlsson, F.H., Tremaroli, V., Nookaew, I., Bergstrm, G., Behre, C.J., et al.: Gut metagenome in european women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013)
Article Google Scholar
Parks, B.W., Nam, E., Org, E., Kostem, E., Norheim, F., et al.: Genetic control of obesity and gut microbiota composition in response to high-fat, high-sucrose diet in mice. Cell Metab. 17, 141–152 (2013)
Article Google Scholar
Gower, J.C.: Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53, 325–338 (1966)
Article MATH MathSciNet Google Scholar
McArdle, B.H., Anderson, M.J.: Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82, 290–297 (2001)
Article Google Scholar
Bray, J.R., Curtis, J.T.: An ordination of the upland forest communities of southern wisconsin. Ecological monographs 27, 325–349 (1957)
Article Google Scholar
Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proc. Natl. Acad. Sci. USA 102, 1572–1577 (2005)
Article Google Scholar
Bokulich, N.A., Subramanian, S., Faith, J.J., Gevers, D., Gordon, J.I., et al.: Quality-filtering vastly improves diversity estimates from illumina amplicon sequencing. Nat. Methods 10, 57–59 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioinformatics IDP, University of California, Los Angeles, USA
Jong Wha J. Joo & Eleazar Eskin
Computer Science Department, University of California, Los Angeles, USA
Eun Yong Kang, Nick Furlotte & Eleazar Eskin
Department of Medicine, University of California, Los Angeles, USA
Elin Org, Brian Parks & Aldons J. Lusis
Department of Microbiology, Immunology and Molecular Genetics, University of California, Los Angeles, USA
Aldons J. Lusis
Department of Human Genetics, University of California, Los Angeles, USA
Aldons J. Lusis & Eleazar Eskin

Authors

Jong Wha J. Joo
View author publications
You can also search for this author in PubMed Google Scholar
Eun Yong Kang
View author publications
You can also search for this author in PubMed Google Scholar
Elin Org
View author publications
You can also search for this author in PubMed Google Scholar
Nick Furlotte
View author publications
You can also search for this author in PubMed Google Scholar
Brian Parks
View author publications
You can also search for this author in PubMed Google Scholar
Aldons J. Lusis
View author publications
You can also search for this author in PubMed Google Scholar
Eleazar Eskin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eleazar Eskin .

Editor information

Editors and Affiliations

National Center of Biotechnology Information, Bethesda, Maryland, USA
Teresa M. Przytycka

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Joo, J.W.J. et al. (2015). Efficient and Accurate Multiple-Phenotypes Regression Method for High Dimensional Data Considering Population Structure. In: Przytycka, T. (eds) Research in Computational Molecular Biology. RECOMB 2015. Lecture Notes in Computer Science(), vol 9029. Springer, Cham. https://doi.org/10.1007/978-3-319-16706-0_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-16706-0_15
Published: 26 March 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-16705-3
Online ISBN: 978-3-319-16706-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics