Adopting Multivariate Nonparametric Tools to Determine Genotype-Phenotype Interactions in Health and Disease

Montoliu, Ivan

doi:10.1007/978-1-4471-6539-2_3

Ivan Montoliu⁴

Part of the book series: Molecular and Integrative Toxicology ((MOLECUL))

3119 Accesses
1 Citations

Abstract

This chapter describes the role of machine learning approaches such as random forests in holistic discovery applications and provides a background for its better understanding. Their suitability for feature selection, data integration, and network modelling are also evaluated through recent examples in the literature. These examples cover a variety of fields, ranging from ecology to metabolomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18(1):71–103.
Article Google Scholar
Shaffer JP. Multiple hypothesis testing. Annu Rev Psychol. 1995;46(1):561–84.
Article Google Scholar
Smith CA, Want EJ, O’Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal Chem. 2006;78(3):779–87.
Article CAS PubMed Google Scholar
Nicholson JK, Lindon JC, Holmes E. ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica. 1999;29(11):1181–9.
Article CAS PubMed Google Scholar
Fiehn O. Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol. 2002;48(1–2):155–71.
Article CAS PubMed Google Scholar
Montoliu I, Genick U, Ledda M, Collino S, Martin FP, Le Coutre J, et al. Current status on genome-metabolome-wide associations: an opportunity in nutrition research. Genes Nutr. 2013;8(1):19–27.
Article PubMed Central CAS PubMed Google Scholar
Massart DL, Vandeginste BGM, Buydens LMC, De Jong S, Lewi PJ, Smeyers-Verbeke J. Handbook of chemometrics and qualimetrics. Amsterdam: Elsevier Science B.V.; 1997.
Google Scholar
Jolliffe IT. Principal component analysis. New York: Springer; 2002.
Google Scholar
Wold S, Sjöström M, Eriksson L. PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst. 2001;58(2):109–30.
Article CAS Google Scholar
Geladi P, Kowalski BR. Partial least-squares regression: a tutorial. Anal Chim Acta. 1986;185(C):1–17.
Article CAS Google Scholar
Trygg J, Wold S. Orthogonal projections to latent structures (O-PLS). J Chemom. 2002;16(3):119–28.
Article CAS Google Scholar
Trygg J, Wold S. O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J Chemom. 2003;17(1):53–64.
Article CAS Google Scholar
Westerhuis JA, Hoefsloot HCJ, Smit S, Vis DJ, Smilde AK, Velzen EJJ, et al. Assessment of PLSDA cross validation. Metabolomics. 2008;4(1):81–9.
Article CAS Google Scholar
Barker M, Rayens W. Partial least squares for discrimination. J Chemom. 2003;17(3):166–73.
Article CAS Google Scholar
Bylesjö M, Rantalainen M, Cloarec O, Nicholson JK, Holmes E, Trygg J. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. J Chemom. 2006;20(8–10):341–51.
Article Google Scholar
Westerhuis JA, van Velzen EJJ, Hoefsloot HCJ, Smilde AK. Multivariate paired data analysis: multilevel PLSDA versus OPLSDA. Metabolomics. 2010;6(1):119–28.
Article PubMed Central CAS PubMed Google Scholar
Cloarec O, Dumas ME, Craig A, Barton RH, Trygg J, Hudson J, et al. Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. Anal Chem. 2005;77(5):1282–9.
Article CAS PubMed Google Scholar
Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97.
Google Scholar
Hagan MT, Menhaj MB. Training feedforward networks with the Marquardt algorithm. IEEE Trans Neural Netw. 1994;5(6):989–93.
Article CAS PubMed Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46(1–3):389–422.
Article Google Scholar
Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. Boca Raton: CRC Press LLC; 1984.
Google Scholar
Duda RO, Hart PE, Stork DG. Pattern classification. 2nd ed. New York: Wiley; 2001.
Google Scholar
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Data mining, inference and prediction. 2nd ed. New York: Springer; 2009. p. 588.
Google Scholar
Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
Google Scholar
Friedman J, Hastie T, Tibshirani R. Additive logistic regression: a statistical view of boosting. Ann Stat. 2000;28(2):337–407.
Article Google Scholar
Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictions. Mach Learn. 1999;37(3):297–336.
Article Google Scholar
Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
Article Google Scholar
Liaw A, Wiener M. Classification and regression by randomForest. R News. 2002;2(3):18–22.
Google Scholar
Borg I, Groenen P. Modern multidimensional scaling: theory and applications. New York: Springer Verlag; 2005.
Google Scholar
Janitza S, Strobl C, Boulesteix AL. An AUC-based permutation variable importance measure for random forests. BMC Bioinforma. 2013;14:119.
Article Google Scholar
Moutselos K, Maglogiannis I, Chatziioannou A, editors. Heterogeneous data fusion and selection in high-volume molecular and imaging datasets. IEEE 12th conference on Bioinformatics and Bioengineering proceedings 2012;407–412.
Google Scholar
Viswanath S, Bloch BN, Rosen M, Chappelow J, Toth R, Rofsky N, et al. Integrating structural and functional imaging for computer assisted detection of prostate cancer on multi-protocol in vivo 3 tesla MRI. SPIE Medical Imaging 2009;7260.
Google Scholar
Swatantran A, Dubayah R, Goetz S, Hofton M, Betts MG, Sun M, et al. Mapping migratory bird prevalence using remote sensing data fusion. PLoS ONE. 2012;7(1):e28922.
Article PubMed Central CAS PubMed Google Scholar
Latifi H, Nothdurft A, Straub C, Koch B. Modelling stratified forest attributes using optical/LiDAR features in a central European landscape. Int J Digit Earth. 2012;5(2):106–32.
Article Google Scholar
Pang H, Lin A, Holford M, Enerson BE, Lu B, Lawton MP, et al. Pathway analysis using random forests classification and regression. Bioinformatics. 2006;22(16):2028–36.
Article CAS PubMed Google Scholar
Acharjee A, Kloosterman B, de Vos RCH, Werij JS, Bachem CWB, Visser RGF, et al. Data integration and network reconstruction with -omics data using Random Forest regression in potato. Anal Chim Acta. 2011;705(1–2):56–63.
Article CAS PubMed Google Scholar
Chen Z, Zhang W. Integrative analysis using module-guided random forests reveals correlated genetic factors related to mouse weight. PLoS Comput Biol. 2013;9(3):e1002956.
Article PubMed Central CAS PubMed Google Scholar
Tang X, Xiao J, Li Y, Wen Z, Fang Z, Li M. Systematic analysis revealed better performance of random forest algorithm coupled with complex network features in predicting microRNA precursors. Chemom Intell Lab Syst. 2012;118:317–23.
Article CAS Google Scholar
Lin N, Wu B, Jansen R, Gerstein M, Zhao H. Information assessment on predicting protein-protein interactions. BMC Bioinforma. 2004;5:154.
Article Google Scholar
Lee J, Lee J. Hidden information revealed by optimal community structure from a protein-complex bipartite network improves protein function prediction. PLoS ONE. 2013;8(4):e60372.
Article PubMed Central CAS PubMed Google Scholar
Han P, Zhang X, Norton RS, Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinforma. 2009;10:8.
Article Google Scholar
Li ZC, Lai YH, Chen LL, Zhou X, Dai Z, Zou XY. Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features. Anal Chim Acta. 2012;718:32–41.
Article CAS PubMed Google Scholar
Zheng C, Wang M, Takemoto K, Akutsu T, Zhang Z, Song J. An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins. PLoS ONE. 2012;7(11):e49716.
Article PubMed Central CAS PubMed Google Scholar
Muppirala UK, Honavar VG, Dobbs D. Predicting RNA-protein interactions using only sequence information. BMC Bioinforma. 2011;12:489.
Article CAS Google Scholar
Mehan MR, Nunez-Iglesias J, Dai C, Waterman MS, Zhou XJ. An integrative modular approach to systematically predict gene-phenotype associations. BMC Bioinforma. 2010;11 Suppl 1:S62.
Article Google Scholar
Yang ZR. Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy. BMC Bioinforma. 2009;10:361.
Article Google Scholar
Cao DS, Liang YZ, Deng Z, Hu QN, He M, Xu QS, et al. Genome-scale screening of drug-target associations relevant to Ki using a chemogenomics approach. PLoS ONE. 2013;8(4):e57680.
Article PubMed Central CAS PubMed Google Scholar
Heider D, Verheyen J, Hoffmann D. Predicting Bevirimat resistance of HIV-1 from genotype. BMC Bioinforma. 2010;11:37.
Article Google Scholar
Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, et al. A systematic prediction of multiple drug-target interactions from chemical, genomic, and pharmacological data. PLoS ONE. 2012;7(5):e37608.
Article PubMed Central CAS PubMed Google Scholar
Wang M, Zhao XM, Takemoto K, Xu H, Li Y, Akutsu T, et al. FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model. PLoS ONE. 2012;7(8):e43847.
Article PubMed Central CAS PubMed Google Scholar
Pesch R, Zimmer R. Complementing the eukaryotic protein interactome. PLoS ONE. 2013;8(6):e66635.
Article PubMed Central CAS PubMed Google Scholar
Fernandez-Blanco E, Aguiar-Pulido V, Robert Munteanu C, Dorado J. Random forest classification based on star graph topological indices for antioxidant proteins. J Theor Biol. 2013;317:331–7.
Article CAS PubMed Google Scholar
Ko D, Windle B. Enriching for correct prediction of biological processes using a combination of diverse classifiers. BMC Bioinforma. 2011;12:189.
Article Google Scholar
Masso M, Vaisman II. Accurate and efficient gp120 V3 loop structure based models for the determination of HIV-1 co-receptor usage. BMC Bioinforma. 2010;11:494.
Article Google Scholar
Liu S, Chen Y, Wilkins D. Large margin classifiers and random forests for integrated biological prediction. Int J Bioinforma Res Appl. 2012;8(1–2):38–53.
Article Google Scholar
Winham SJ, Colby CL, Freimuth RR, Wang X, de Andrade M, Huebner M, et al. SNP interaction detection with Random Forests in high-dimensional genetic data. BMC Bioinforma. 2012;13:164.
Article Google Scholar
Wu Q, Ye Y, Liu Y, Ng MK. SNP selection and classification of genome-wide SNP data using stratified sampling random forests. IEEE Trans Nanobiosci. 2012;11(3):216–27.
Article Google Scholar
Tripoliti EE, Fotiadis DI, Manis G. Automated diagnosis of diseases based on classification: dynamic determination of the number of trees in random forests algorithm. IEEE Trans Inf Technol Biomed. 2012;16(4):615–22.
Article PubMed Google Scholar
Robnik-Sikonja M. Improving random forests. 2004.
Google Scholar
Tripoliti EE, Fotiadis DI, Manis G. Modifications of the construction and voting mechanisms of the random forests algorithm. Data Knowl Eng. 2013;87:41–65.
Article Google Scholar
Anaissi A, Kennedy PJ, Goyal M, Catchpoole DR. A balanced iterative random forest for gene selection from microarray data. BMC Bioinforma. 2013;14:261.
Article Google Scholar
Xiao Y, Segal MR. Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Comput Biol. 2009;5(6):e1000414.
Article PubMed Central PubMed Google Scholar
Jiang L. Learning random forests for ranking. Front Comput Sci China. 2011;5(1):79–86.
Article CAS Google Scholar
Bernard S, Adam S, Heutte L. Dynamic random forests. Pattern Recogn Lett. 2012;33(12):1580–6.
Article Google Scholar
Li S, Fedorowicz A, Singh H, Soderholm SC. Application of the random forest method in studies of Local Lymph Node Assay based skin sensitization data. J Chem Inf Model. 2005;45(4):952–64.
Article CAS PubMed Google Scholar
Garge NR, Bobashev G, Eggleston B. Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinforma. 2013;14:125.
Article Google Scholar
Leistner C, Saffari A, Santner J, Bischof H, editors. Semi-supervised random forests. 2009.
Google Scholar
Zeng JY, Cao XH, Gan JY. An improvement of AdaBoost for face detection with random forests. ed. CCIS; 2010;93: 22–9.
Google Scholar
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE. 2010;5(9):e12776.
Article PubMed Central PubMed Google Scholar
Chang JS, Yeh RF, Wiencke JK, Wiemels JL, Smirnov I, Pico AR, et al. Pathway analysis of single-nucleotide polymorphisms potentially associated with glioblastoma multiforme susceptibility using random forests. Cancer Epidemiol Biomarkers Prev. 2008;17(6):1368–73.
Article CAS PubMed Google Scholar
Chung RH, Chen YE. A two-stage random forest-based pathway analysis method. PLoS ONE. 2012;7(5):e36662.
Article PubMed Central CAS PubMed Google Scholar
Pang H, Zhao H. Building pathway clusters from Random Forests classification using class votes. BMC Bioinforma. 2008;9:87.
Article Google Scholar
Collino S, Martin F-P, Montoliu I, Barger J, Da Silva L, Prolla T, et al. Transcriptomics and metabonomics identify essential metabolic signatures in calorie restriction (CR) regulation across multiple mouse strains. Metabolites. 2013;3(4):881–911. PubMed PMID: doi:10.3390/metabo3040881.

Download references

Author information

Authors and Affiliations

Analytical Sciences, Applied Mathematics, Nestec Ltd., Nestlé Research Centre, 44, Route du Jorat 57, CH-1000, Lausanne 26, Switzerland
Ivan Montoliu

Authors

Ivan Montoliu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Montoliu .

Editor information

Editors and Affiliations

Analytical Sciences Competence Pillar, Nestec SA, Nestlé Research Center, Lausanne, Switzerland
Sunil Kochhar
Molecular Biomarkers, Nestlé Institute of Health Sciences, Lausanne, Switzerland
François-Pierre Martin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Montoliu, I. (2015). Adopting Multivariate Nonparametric Tools to Determine Genotype-Phenotype Interactions in Health and Disease. In: Kochhar, S., Martin, FP. (eds) Metabonomics and Gut Microbiota in Nutrition and Disease. Molecular and Integrative Toxicology. Springer, London. https://doi.org/10.1007/978-1-4471-6539-2_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6539-2_3
Published: 19 September 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6538-5
Online ISBN: 978-1-4471-6539-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics