Abstract
Bayesian variable selection provides a principled framework for incorporating prior information to regularize parameters in high-dimensional large-p-small-n regression models such as genomewide association studies (GWAS). Although these models produce more informative results, researchers often disregard them in favor of simpler models because of their high computational cost. We explore our recently proposed spatial boost model for GWAS on quantitative traits to assess the computational efficiency of a more representative model. The spatial boost model is a Bayesian hierarchical model that exploits spatial information on the genome to uniquely define prior probabilities of association of genetic markers based on their proximities to relevant genes. We propose analyzing large data sets by first applying an expectation–maximization filter to reduce the dimensionality of the space and then applying an efficient Gibbs sampler on the remaining markers. Finally we conduct a thorough simulation study based on real genotypes provided by the Wellcome Trust Case Control Consortium and compare our model to single association tests.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ardlie, K.G., Kruglyak, L., Seielstad, M.: Patterns of linkage disequilibrium in the human genome. Nat. Rev. Genet. 3(4), 299–309 (2002)
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)
Brooks, S.P., Gelman, A.: General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7(4), 434–455 (1998)
Carvalho, L.E., Lawrence, C.E.: Centroid estimation in discrete high-dimensional spaces with applications in biology. Proc. Natl. Acad. Sci. 105(9), 3209–3214 (2008)
Guan, Y., Stephens, M.: Bayesian variable selection regression for genome-wide association studies and other large-scale problems. Ann. Appl. Stat. 5(3), 1780–1815 (2011)
Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33(2), 730–773 (2005)
Lewis, B.: irlba: Fast partial SVD by implicitly-restarted Lanczos bidiagonalization. R package version 0.1 1, 1520 (2009)
Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.: Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38(8), 904–909 (2006)
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D., Maller, J., Sklar, P., De Bakker, P.I., Daly, M.J., et al.: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81(3), 559–575 (2007)
Ročková, V., George, E.I.: EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 109(506), 828–846 (2014)
Wigginton, J.E., Cutler, D.J., Abecasis, G.R.: A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 76(5), 887–893 (2005)
Wu, T.T., Chen, Y.F., Hastie, T., Sobel, E., Lange, K.: Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6), 714–721 (2009)
Acknowledgments
IJ and LC were partially supported by NSF grant DMS-1107067. This study makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from wtccc.org.uk.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Johnston, I., Jin, Y., Carvalho, L. (2015). Assessing a Spatial Boost Model for Quantitative Trait GWAS. In: Polpo, A., Louzada, F., Rifo, L., Stern, J., Lauretto, M. (eds) Interdisciplinary Bayesian Statistics. Springer Proceedings in Mathematics & Statistics, vol 118. Springer, Cham. https://doi.org/10.1007/978-3-319-12454-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-12454-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12453-7
Online ISBN: 978-3-319-12454-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)