Abstract
A wide assortment of R tools are available for exploratory data analysis in high-dimensional settings and are easily applicable to data arising from population-based genetic association studies. In this chapter we illustrate the application of three such approaches, namely conditional inference trees, random forests, and logic regression. Through applications to simulated data, we explore the relative utility of each approach for uncovering underlying association between genetic polymorphisms and a quantitative trait.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Broman, K. W., Sen, S (2009) A Guide to QTL Mapping with R/qtl. Springer, New York, NY.
Foulkes, A. S. (2009) Applied Statistical Genetics with R: For Population-Based Association Studies. Springer, New York, NY.
Ziegler, A., Koenig, I. R. (2007) A Statistical Approach to Genetic Epidemiology. Wiley-VCH, Weinheim.
Clayton, D., Leung, H. T. (2007) An R package for analysis of whole-genome association studies. Human Heredity, 64, 45–51.
Clayton, D., Wallace, C. (2008) snpMatrix vignette: Example of genome-wide association testing. http://bioconductor.org/packages/2.6/bioc/html/snpMatrix.html, pages 1–18
Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. (1993) Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton, FL.
Zhang, H., Singer, B. (1999) Recursive Partitioning in the Health Sciences. Springer, New York, NY.
Hothorn, T., Hornik, K., Zeileis, A. (2006) Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651–674.
Hothorn, T., Hornik, K., van de Wiel, M. A., Zeileis, A. (2006) A lego system for conditional inference. The American Statistician, 60, 257–263.
Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.
Breiman, L. (2003) Manual – Setting up, using and understanding random forests v4.0 http://oz.berkeley.edu/users/breiman/ Using random forests v4.0.pdf.
Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., Van Eerdewegh, P. (2005) Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology, 28, 171–182.
Ruczinski, I., Kooperberg, C., LeBlanc, M. (2003) Logic regression. Journal of Computational and Graphical Statistics, 12, 475–511.
Kooperberg, C., Ruczinski, I., LeBlanc, M., Hsu, L. (2001) Sequence analysis using logic regression. Genetic Epidemiology, 21, S626–S631.
Ruczinski, I., Kooperberg, C., LeBlanc, M. (2004) Exploring interactions in high dimensional genomic data: An overview of logic regression. Journal of Multivariate Analysis, 90, 178–195.
Kooperberg, C., Ruczinski, I. (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genetic Epidemiology, 28, 157–170.
Schwender, H., Ickstadt, K. (2008) Identification of SNP interactions using logic regression. Biostatistics 9, 187–198.
Fritsch, A., Ickstadt, K. (2007) Comparing Logic Regression Based Methods for Identifying SNP Interactions. Bioinformatics in Research and Development 2007, LNBI 4414, Springer, Berlin, pp. 90–103.
Schwender, H., Ickstadt, K. (2008) Quantifying the importance of genotypes and sets of single nucleotide polymorphisms for prediction in association studies. Technical report, Dortmund University of Technology.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Foulkes, A.S., Au, K. (2011). R Statistical Tools for Gene Discovery. In: Yu, B., Hinchcliffe, M. (eds) In Silico Tools for Gene Discovery. Methods in Molecular Biology, vol 760. Humana Press. https://doi.org/10.1007/978-1-61779-176-5_5
Download citation
DOI: https://doi.org/10.1007/978-1-61779-176-5_5
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-175-8
Online ISBN: 978-1-61779-176-5
eBook Packages: Springer Protocols