Skip to main content

An Optimum Random Forest Model for Prediction of Genetic Susceptibility to Complex Diseases

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Abstract

High-throughput single nucleotide polymorphism (SNP) genotyping technologies make massive genotype data, with a large number of individuals, publicly available. Accessibility of genetic data makes genome-wide association studies for complex diseases possible. One of the most challenging issues in genome-wide association studies is to search and analyze genetic risk factors resulting from interactions of multiple genes. The integrated risk factor usually have a higher risk rate than single SNPs. This paper explores the possibility of applying random forest to search disease-associated factors for given case/control samples. An optimum random forest based algorithm is proposed for the disease susceptibility prediction problem. The proposed method has been applied to publicly available genotype data on Crohn’s disease and autoimmune disorders for predicting susceptibility to these diseases. The achieved accuracy of prediction is higher than those achieved by universal prediction methods such as Support Vector Machine (SVM) and previous known methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cardon, L.R., Bell, J.I.: Association Study Designs for Complex Diseases. Nature Reviews: Gentics 2, 91–98 (2001)

    Article  Google Scholar 

  2. Hirschhorn, J.N., Daly, M.J.: Genome-wide Association Studies for Common Diseases and Complex Diseases. Nature Reviews: Gentics 6, 95–108 (2005)

    Article  Google Scholar 

  3. Merikangas, K., Risch, N.: Will the Genomics Revolution Revolutionize Psychiatry. The American Journal of Psychiatry 160, 625–635 (2003)

    Article  Google Scholar 

  4. Botstein, D., Risch, N.: Discovering Genotypes Underlying Human Phenotypes: Past Successes for Mendelian Disease, Future Approaches for Complex Disease. Nature Genetics 33, 228–237 (2003)

    Article  Google Scholar 

  5. Clark, A.G., et al.: Determinants of the success of whole-genome association testing. Genome Res. 15, 1463–1467 (2005)

    Article  Google Scholar 

  6. He, J., Zelikovsky, A.: Tag SNP Selection Based on Multivariate Linear Regression. In: Alexandrov, V.N., et al. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 750–757. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Brinza, D., He, J., Zelikovsky, A.: Combinatorial Search Methods for Multi-SNP Disease Association. In: Proc. of Intl. Conf. of the IEEE Engineering in Medicine and Biology, IEEE, Los Alamitos (2006)

    Google Scholar 

  8. York, T.P., Eaves, L.J.: Common Disease Analysis using Multivariate Adaptive Regression Splines (MARS): Genetic AnalysisWorkshop 12 simulated sequence data. Genet. Epidemiology 21(Suppl. I), S649–654 (2001)

    Google Scholar 

  9. Cook, N.R., Zee, R.Y., Ridker, P.M.: Tree and Spline Based Association Analysis of gene-gene interaction models for ischemic stroke. Stat. Med. 23(9), I439–I453 (2004)

    Article  Google Scholar 

  10. Ritchie, M.D., et al.: Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001)

    Article  Google Scholar 

  11. Hahn, L.W., Ritchie, M.D., Moore, J.H.: Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19, 376–382 (2003)

    Article  Google Scholar 

  12. Lunetta, K., et al.: Screening Large-scale Association Study Data: Exploiting Interactions Using Random Forests. BMC Genet. 5, 32 (2004)

    Article  Google Scholar 

  13. Daly, M., et al.: High resolution haplotype structure in the human genome. Nature Genetics 29, 229–232 (2001)

    Article  Google Scholar 

  14. Mao, W., et al.: A Combinatorial Method for Predicting Genetic Susceptibility to Complex Diseases. In: Proc. Intl. Conf. of the IEEE Engineering In Medicine and Biology Society (EMBC 2005), pp. 224–227. IEEE Computer Society Press, Los Alamitos (2005)

    Google Scholar 

  15. Mao, W., et al.: Genotype Susceptibility and Integrated Risk Factors for Complex Diseases. In: Proc. IEEE Intl. Conf. on Granular Computing (GRC 2006), pp. 754–757. IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  16. Kimmel, G., Shamir, R.: A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association. J. of Computational Biology 12(10), 1243–1260 (2005)

    Article  Google Scholar 

  17. Listgarten, J., et al.: Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clinical Cancer Research 10, 2725–2737 (2004)

    Article  Google Scholar 

  18. Ueda, H., Howson, J.M.M., Esposito, L., et al.: Association of the T Cell Regulatory Gene CTLA4 with Susceptibility to Autoimmune Disease. Nature 423, 506–511 (2003)

    Article  Google Scholar 

  19. Breiman, L., Cutler, A.: http://www.stat.berkeley.edu/users/breiman/RF

  20. Brinza, D., Zelikovsky, A.: 2SNP: Scalable Phasing Based on 2-SNP Haplotypes. Bioinformatics 22(3), 371–373 (2006)

    Article  Google Scholar 

  21. Waddell, M., et al.: Predicting Cancer Susceptibility from SingleNucleotide Polymorphism Data: A Case Study in Multiple Myeloma. In: Proceddings of BIOKDD (2005)

    Google Scholar 

  22. Chang, C., Lin, C.: http://www.csie.ntu.edu.tw/~cjlin/libsvm

  23. Kimmel, G., Shamir, R.: A Block-Free Hidden Markov Model for Genotypes and Its Application to Disease Association. J. of Computational Biology 12(10), 1243–1260 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Mao, W., Kelly, S. (2007). An Optimum Random Forest Model for Prediction of Genetic Susceptibility to Complex Diseases. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71701-0_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71700-3

  • Online ISBN: 978-3-540-71701-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics