Skip to main content

Modeling of Biochemical Networks via Classification and Regression Tree Methods

  • Chapter
  • First Online:

Part of the book series: Nonlinear Systems and Complexity ((NSCH,volume 24))

Abstract

In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)

    Google Scholar 

  2. Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as an alternative approach of Gaussian graphical model for biochemical networks. J. Appl. Stat. 44(16), 2858–2876 (2017)

    Article  MathSciNet  Google Scholar 

  3. Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 2101–2113 (2004)

    Article  Google Scholar 

  4. Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)

    Google Scholar 

  5. Bozdoğan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)

    Article  MathSciNet  Google Scholar 

  6. Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)

    Google Scholar 

  7. Breiman, L.: Bagging predictors. Mach. Learn. 2(24), 123–140 (1996)

    MATH  Google Scholar 

  8. Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)

    Article  Google Scholar 

  9. Dobra, A., Lenkoski, A.: Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5(2A), 969–993 (2011)

    Article  MathSciNet  Google Scholar 

  10. Everett, B., Dunn G.: Applied Multivariate Data Analysis. Arnold Press, London (2001)

    Book  Google Scholar 

  11. Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)

    Article  MathSciNet  Google Scholar 

  12. Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)

    Article  MathSciNet  Google Scholar 

  13. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)

    Article  Google Scholar 

  14. Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)

    Google Scholar 

  15. Genest, C., Favre, A.C.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydroelectric Eng. 12(4), 347–368 (2007)

    Article  Google Scholar 

  16. Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)

    Article  Google Scholar 

  17. Hastie, T.: The Elements of Statistical Learning. Springer, New York (2001)

    Book  Google Scholar 

  18. Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Upper Saddle River (2002)

    MATH  Google Scholar 

  19. Jones, D.T., Buchan, D.W.A., Cozzetto, D., Pontil, M.: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large sequence alignments. Bioinformatics 28(2), 184–190 (2012)

    Article  Google Scholar 

  20. Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue-resisue contact predictions in a sequence- and structure-rich era. PNAS 110(39), 15674–15679 (2013)

    Article  Google Scholar 

  21. Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)

    Google Scholar 

  22. Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)

    Google Scholar 

  23. Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M.: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004)

    Article  Google Scholar 

  24. Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)

    Article  Google Scholar 

  25. Meinhausen, N., Buhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006)

    Article  MathSciNet  Google Scholar 

  26. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)

    Book  Google Scholar 

  27. Pinto, A.A., Zilberman, D.: Modeling, Dynamics, Optimization and Bioeconomics I. Springer International Publishing, Cham (2014)

    MATH  Google Scholar 

  28. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  29. Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)

    Google Scholar 

  30. Taylan, P., Weber, G.W., Yerlikaya Özkurt, F.: A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. Top 18(2), 377–395 (2010)

    Article  MathSciNet  Google Scholar 

  31. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  32. Tibshirani, R., Saunders, M.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. 67(1), 91–108 (2005)

    Article  MathSciNet  Google Scholar 

  33. Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)

    Google Scholar 

  34. Trivedi, K., Zimmer, D.: Copula modeling: an introduction for practitioners. Found. Trends Econ. 1(1), 1–111 (2005)

    MATH  Google Scholar 

  35. Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)

    Google Scholar 

  36. Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)

    MATH  Google Scholar 

  37. Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Chichester (2001)

    MATH  Google Scholar 

  38. Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)

    Google Scholar 

  39. Witten, D.M., Tibshirani, R.: Covariance regularised regression and classification for high dimensional problems. J. R. Stat. Soc. 71(3), 615–636 (2009)

    Article  MathSciNet  Google Scholar 

  40. Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrica 94, 19–35 (2007)

    Article  MathSciNet  Google Scholar 

  41. Zhou, S.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12(4), 2975–3026 (2011)

    MathSciNet  MATH  Google Scholar 

  42. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MathSciNet  Google Scholar 

  43. Zou, H., Hastie T.: Regularisation and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vilda Purutçuoğlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Seçilmiş, D., Purutçuoğlu, V. (2019). Modeling of Biochemical Networks via Classification and Regression Tree Methods. In: Taş, K., Baleanu, D., Machado, J. (eds) Mathematical Methods in Engineering. Nonlinear Systems and Complexity, vol 24. Springer, Cham. https://doi.org/10.1007/978-3-319-90972-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-90972-1_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-90971-4

  • Online ISBN: 978-3-319-90972-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics