Skip to main content

R Statistical Tools for Gene Discovery

  • Protocol
  • First Online:
In Silico Tools for Gene Discovery

Part of the book series: Methods in Molecular Biology ((MIMB,volume 760))

Abstract

A wide assortment of R tools are available for exploratory data analysis in high-dimensional settings and are easily applicable to data arising from population-based genetic association studies. In this chapter we illustrate the application of three such approaches, namely conditional inference trees, random forests, and logic regression. Through applications to simulated data, we explore the relative utility of each approach for uncovering underlying association between genetic polymorphisms and a quantitative trait.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Broman, K. W., Sen, S (2009) A Guide to QTL Mapping with R/qtl. Springer, New York, NY.

    Book  Google Scholar 

  2. Foulkes, A. S. (2009) Applied Statistical Genetics with R: For Population-Based Association Studies. Springer, New York, NY.

    Book  Google Scholar 

  3. Ziegler, A., Koenig, I. R. (2007) A Statistical Approach to Genetic Epidemiology. Wiley-VCH, Weinheim.

    Google Scholar 

  4. Clayton, D., Leung, H. T. (2007) An R package for analysis of whole-genome association studies. Human Heredity, 64, 45–51.

    Article  PubMed  Google Scholar 

  5. Clayton, D., Wallace, C. (2008) snpMatrix vignette: Example of genome-wide association testing. http://bioconductor.org/packages/2.6/bioc/html/snpMatrix.html, pages 1–18

  6. Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. (1993) Classification and Regression Trees. Chapman and Hall/CRC, Boca Raton, FL.

    Google Scholar 

  7. Zhang, H., Singer, B. (1999) Recursive Partitioning in the Health Sciences. Springer, New York, NY.

    Google Scholar 

  8. Hothorn, T., Hornik, K., Zeileis, A. (2006) Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15, 651–674.

    Article  Google Scholar 

  9. Hothorn, T., Hornik, K., van de Wiel, M. A., Zeileis, A. (2006) A lego system for conditional inference. The American Statistician, 60, 257–263.

    Article  Google Scholar 

  10. Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  11. Breiman, L. (2003) Manual – Setting up, using and understanding random forests v4.0 http://oz.berkeley.edu/users/breiman/ Using random forests v4.0.pdf.

  12. Bureau, A., Dupuis, J., Falls, K., Lunetta, K. L., Hayward, B., Keith, T. P., Van Eerdewegh, P. (2005) Identifying SNPs predictive of phenotype using random forests. Genetic Epidemiology, 28, 171–182.

    Article  PubMed  Google Scholar 

  13. Ruczinski, I., Kooperberg, C., LeBlanc, M. (2003) Logic regression. Journal of Computational and Graphical Statistics, 12, 475–511.

    Article  Google Scholar 

  14. Kooperberg, C., Ruczinski, I., LeBlanc, M., Hsu, L. (2001) Sequence analysis using logic regression. Genetic Epidemiology, 21, S626–S631.

    PubMed  Google Scholar 

  15. Ruczinski, I., Kooperberg, C., LeBlanc, M. (2004) Exploring interactions in high dimensional genomic data: An overview of logic regression. Journal of Multivariate Analysis, 90, 178–195.

    Article  Google Scholar 

  16. Kooperberg, C., Ruczinski, I. (2005) Identifying interacting SNPs using Monte Carlo logic regression. Genetic Epidemiology, 28, 157–170.

    Article  PubMed  Google Scholar 

  17. Schwender, H., Ickstadt, K. (2008) Identification of SNP interactions using logic regression. Biostatistics 9, 187–198.

    Article  PubMed  Google Scholar 

  18. Fritsch, A., Ickstadt, K. (2007) Comparing Logic Regression Based Methods for Identifying SNP Interactions. Bioinformatics in Research and Development 2007, LNBI 4414, Springer, Berlin, pp. 90–103.

    Google Scholar 

  19. Schwender, H., Ickstadt, K. (2008) Quantifying the importance of genotypes and sets of single nucleotide polymorphisms for prediction in association studies. Technical report, Dortmund University of Technology.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea S. Foulkes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Foulkes, A.S., Au, K. (2011). R Statistical Tools for Gene Discovery. In: Yu, B., Hinchcliffe, M. (eds) In Silico Tools for Gene Discovery. Methods in Molecular Biology, vol 760. Humana Press. https://doi.org/10.1007/978-1-61779-176-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-176-5_5

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-175-8

  • Online ISBN: 978-1-61779-176-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics