Skip to main content

Unsupervised Gene Network Inference with Decision Trees and Random Forests

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1883))

Abstract

In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Geurts P, Irrthum A, Wehenkel L (2009) Supervised learning with decision tree-based methods in computational and systems biology. Mol Biosyst 5(12):1593–1605

    Article  CAS  PubMed  Google Scholar 

  2. Boulesteix AL, Janitza S, Kruppa J, König IR (2012) Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics. Wiley Interdiscip Rev Data Min Knowl Disc 2(6):493–507

    Article  Google Scholar 

  3. Biau G, Scornet E (2016) A random forest guided tour. TEST 25(2):197–227

    Article  Google Scholar 

  4. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9):e12776

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Marbach D, Costello JC, Küffner R, Vega N, Prill RJ, Camacho DM, Allison KR, the DREAM5 Consortium, Kellis M, Collins JJ, Stolovitzky G (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9(8):796–804

    Article  PubMed  CAS  Google Scholar 

  6. Omranian N, Eloundou-Mbebi JMO, Mueller-Roeber B, Nikoloski Z (2016) Gene regulatory network inference using fused lasso on multiple data sets. Sci Rep 6:20533

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kiani NA, Zenil H, Olczak J, Tegnér J (2016) Evaluating network inference methods in terms of their ability to preserve the topology and complexity of genetic networks. Semin Cell Dev Biol 51:44–52

    Article  PubMed  Google Scholar 

  8. Bellot P, Olsen C, Salembier P, Oliveras-Vergés A, Meyer PE (2015) NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference. BMC Bioinf 16:312

    Article  CAS  Google Scholar 

  9. Maetschke SR, Madhamshettiwar PB, Davis MJ, Ragan MA (2014) Supervised, semi-supervised and unsupervised inference of gene regulatory networks. Brief Bioinform 15(2):195–211

    Article  PubMed  Google Scholar 

  10. Zhang X, Liu K, Liu ZP, Duval B, Richer JM, Zhao XM, Hao JK, Chen L (2013) NARROMI: a noise and redundancy reduction technique improves accuracy of gene regulatory network inference. Bioinformatics 29(1):106–113

    Article  PubMed  CAS  Google Scholar 

  11. Feizi S, Marbach D, Médard M, Kellis M (2013) Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol 31:726–733

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Madhamshettiwar PB, Maetschke SR, Davis MJ, Reverter A, Ragan MA (2012) Gene regulatory network inference: evaluation and application to ovarian cancer allows the prioritization of drug targets. Genome Med 4(5):41

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Qi J, Michoel T (2012) Context-specific transcriptional regulatory network inference from global gene expression maps using double two-way t-tests. Bioinformatics 28(18):2325–2332

    Article  CAS  PubMed  Google Scholar 

  14. Imam S, Noguera DR, Donohue TJ (2015) An integrated approach to reconstructing genome-scale transcriptional regulatory networks. PLoS Comput Biol 11(2):e1004103

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Arrieta-Ortiz ML, Hafemeister C, Bate AR, Chu T, Greenfield A, Shuster B, Barry SN, Gallitto M, Liu B, Kacmarczyk T, Santoriello F, Chen J, Rodrigues CD, Sato T, Rudner DZ, Driks A, Bonneau R, Eichenberger P (2015) An experimentally supported model of the Bacillus subtilis global transcriptional regulatory network. Mol Syst Biol 11(11):839

    Article  PubMed  PubMed Central  Google Scholar 

  16. Carrera J, Estrela R, Luo J, Rai N, Tsoukalas A, Tagkopoulos I (2014) An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli. Mol Syst Biol 10(7):735

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Sabaghian E, Drebert Z, Inzé D, Saeys Y (2015) An integrated network of Arabidopsis growth regulators and its use for gene prioritization. Sci Rep 5:17617

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Taylor-Teeples M, Lin L, de Lucas M, Turco G, Toal TW, Gaudinier A, Young NF, Trabucco GM, Veling MT, Lamothe R, Handakumbura PP, Xiong G, Wang C, Corwin J, Tsoukalas A, Zhang L, Ware D, Pauly M, Kliebenstein DJ, Dehesh K, Tagkopoulos I, Breton G, Pruneda-Paz JL, Ahnert SE, Kay SA, Hazen SP, Brady SM (2015) An Arabidopsis gene regulatory network for secondary cell wall synthesis. Nature 517(7536):571–575

    Article  CAS  PubMed  Google Scholar 

  19. Marchand G, Huynh-Thu VA, Kane N, Arribat S, Varès D, Rengel D, Balzergue S, Rieseberg L, Vincourt P, Geurts P, Vignes M, Langlade NB (2014) Bridging physiological and evolutionary time-scales in a gene regulatory network. New Phytol 203(2):685–696

    Article  CAS  PubMed  Google Scholar 

  20. Potier D, Davie K, Hulselmans G, Naval Sanchez M, Haagen L, Huynh-Thu V, Koldere D, Celik A, Geurts P, Christiaens V, Aerts S (2014) Mapping gene regulatory networks in Drosophila eye development by large-scale transcriptome perturbations and motif inference. Cell Rep 9(6):2290–2303

    Article  CAS  PubMed  Google Scholar 

  21. Jo J, Hwang S, Kim HJ, Hong S, Lee JE, Lee SG, Baek A, Han H, Lee JI, Lee I, Lee DR (2016) An integrated systems biology approach identifies positive cofactor 4 as a factor that increases reprogramming efficiency. Nucleic Acids Res 44(3):1203–1215

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Acquaah-Mensah GK, Taylor RC (2016) Brain in situ hybridization maps as a source for reverse-engineering transcriptional regulatory networks: Alzheimer’s disease insights. Gene 586(1):77–86

    Article  CAS  PubMed  Google Scholar 

  23. Verfaillie A, Imrichova H, Atak ZK, Dewaele M, Rambow F, Hulselmans G, Christiaens V, Svetlichnyy D, Luciani F, Van den Mooter L, Claerhout S, Fiers M, Journe F, Ghanem GE, Herrmann C, Halder G, Marine JC, Aerts S (2015) Decoding the regulatory landscape of melanoma reveals TEADS as regulators of the invasive cell state. Nat Commun 6:6683

    Article  CAS  PubMed  Google Scholar 

  24. Ko JH, Gu W, Lim I, Zhou T, Bang H (2014) Expression profiling of mitochondrial voltage-dependent anion channel-1 associated genes predicts recurrence-free survival in human carcinomas. PLoS ONE 9(10):e110094

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

    Book  Google Scholar 

  26. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    Google Scholar 

  27. Breiman L, Friedman JH, Olsen RA, Stone CJ (1984) Classification and regression trees. Wadsworth International (California), Belmont

    Google Scholar 

  28. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  29. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  30. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 36(1):3–42

    Article  Google Scholar 

  31. Strobl C, Boulesteix AL, Zeileis A, Horthorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinf 8:25

    Article  CAS  Google Scholar 

  32. Huynh-Thu VA, Geurts P (2018) dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci Rep 8(1):3384

    Google Scholar 

  33. Huynh-Thu VA, Wehenkel L, Geurts P (2013) Gene regulatory network inference from systems genetics data using tree-based methods. In: de la Fuente A (ed) Gene network inference - verification of methods for systems genetics data. Springer, Berlin, pp 63–85

    Chapter  Google Scholar 

  34. Ocone A, Haghverdi L, Mueller NS, Theis FJ (2015) Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics 31(12):i89–i96

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine JC, Geurts P, Aerts J, van den Oord J, Atak ZK, Wouters J, Aerts S (2017) SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14:1083–1086

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Petralia F, Wang P, Yang J, Tu Z (2015) Integrative random forest for gene regulatory network inference. Bioinformatics 31(12):i197–i205

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Chiquet J, Grandvalet Y, Ambroise C (2011) Inferring multiple graphical structures. Stat Comput 21(4):537–553

    Article  Google Scholar 

  38. Mohan K, London P, Fazel M, Witten D, Lee SI (2014) Node-based learning of multiple gaussian graphical models. J Mach Learn Res 15(1):445–488

    PubMed  PubMed Central  Google Scholar 

  39. Tian D, Gu Q, Ma J (2016) Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Res 44(17):e140

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Petralia F, Song WM, Tu Z, Wang P (2016) New method for joint network analysis reveals common and different coexpression patterns among genes and proteins in breast cancer. J Proteome Res 15(3):743–754

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Soinov LA, Krestyaninova MA, Brazma A (2003) Towards reconstruction of gene networks from expression data by supervised learning. Genome Biol 4(1):R6

    Article  PubMed  PubMed Central  Google Scholar 

  42. Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N (2003) Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34:66– 176

    Article  Google Scholar 

  43. Joshi A, De Smet R, Marchal K, Van de Peer Y, Michoel T (2009) Module networks revisited: computational assessment and prioritization of model predictions. Bioinformatics 25(4):490–496

    Article  CAS  PubMed  Google Scholar 

  44. Nepomuceno-Chamorro IA, Aguilar-Ruiz JS, Riquelme JC (2010) Inferring gene regression networks with model trees. BMC Bioinf 11: 517

    Article  Google Scholar 

  45. Huynh-Thu VA, Sanguinetti G (2015) Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 31(10):1614–1622

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Middendorf M, Kundaje A, Wiggins C, Freund Y, Leslie C (2004) Predicting genetic regulatory response using classification. Bioinformatics 20(Suppl_1):i232–i240

    Article  CAS  PubMed  Google Scholar 

  47. Phuong TM, Lee D, Lee KH (2004) Regression trees for regulatory element identification. Bioinformatics 20(5):750–757

    Article  CAS  PubMed  Google Scholar 

  48. Ruan J, Zhang W (2006) A bi-dimensional regression tree approach to the modeling of gene expression regulation. Bioinformatics 22(3):332–340

    Article  CAS  PubMed  Google Scholar 

  49. Xiao Y, Segal MR (2009) Identification of yeast transcriptional regulation networks using multivariate random forests. PLoS Comput Biol 5(6):e1000414

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Lee SI, Pe’er D, Dudley AM, Church GM, Koller D (2006) Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification. Proc Natl Acad Sci 103(38):14062–14067

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Huynh-Thu VA, Saeys Y, Wehenkel L, Geurts P (2012) Statistical interpretation of machine learning-based feature importance scores for biomarker discovery. Bioinformatics 28(13):1766–1774

    Article  CAS  PubMed  Google Scholar 

  52. Degenhardt F, Seifert S, Szymczak S (2017) Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinf bbx124. https://doi.org/10.1093/bib/bbx124

  53. Ishwaran H (2007) Variable importance in binary regression trees and forests. Electron J Stat 1:519–537

    Article  Google Scholar 

  54. Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized trees. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., Red Hook, pp 431–439

    Google Scholar 

  55. Sutera A, Louppe G, Huynh-Thu VA, Wehenkel L, Geurts P (2016) Context-dependent feature analysis with random forests. In: Proceedings of the thirty-second conference on uncertainty in artificial intelligence, UAI’16. AUAI Press, Corvallis, pp 716–725

    Google Scholar 

Download references

Acknowledgements

VAHT is a Post-doctoral Fellow of the F.R.S.-FNRS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vân Anh Huynh-Thu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Huynh-Thu, V.A., Geurts, P. (2019). Unsupervised Gene Network Inference with Decision Trees and Random Forests. In: Sanguinetti, G., Huynh-Thu, V. (eds) Gene Regulatory Networks. Methods in Molecular Biology, vol 1883. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8882-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8882-2_8

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8881-5

  • Online ISBN: 978-1-4939-8882-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics