Advertisement

Modeling of Biochemical Networks via Classification and Regression Tree Methods

  • Deniz Seçilmiş
  • Vilda PurutçuoğluEmail author
Chapter
  • 504 Downloads
Part of the Nonlinear Systems and Complexity book series (NSCH, volume 24)

Abstract

In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.

Keywords

Classification And Regression Tree (CART) Gaussian Graphical Models (GGM) Twoing Rule Gini Rule Split Question 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.

References

  1. 1.
    Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)Google Scholar
  2. 2.
    Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as an alternative approach of Gaussian graphical model for biochemical networks. J. Appl. Stat. 44(16), 2858–2876 (2017)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 2101–2113 (2004)CrossRefGoogle Scholar
  4. 4.
    Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)Google Scholar
  5. 5.
    Bozdoğan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)Google Scholar
  7. 7.
    Breiman, L.: Bagging predictors. Mach. Learn. 2(24), 123–140 (1996)zbMATHGoogle Scholar
  8. 8.
    Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
  9. 9.
    Dobra, A., Lenkoski, A.: Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5(2A), 969–993 (2011)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Everett, B., Dunn G.: Applied Multivariate Data Analysis. Arnold Press, London (2001)CrossRefGoogle Scholar
  11. 11.
    Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)CrossRefGoogle Scholar
  14. 14.
    Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)Google Scholar
  15. 15.
    Genest, C., Favre, A.C.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydroelectric Eng. 12(4), 347–368 (2007)CrossRefGoogle Scholar
  16. 16.
    Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)CrossRefGoogle Scholar
  17. 17.
    Hastie, T.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefGoogle Scholar
  18. 18.
    Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Upper Saddle River (2002)zbMATHGoogle Scholar
  19. 19.
    Jones, D.T., Buchan, D.W.A., Cozzetto, D., Pontil, M.: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large sequence alignments. Bioinformatics 28(2), 184–190 (2012)CrossRefGoogle Scholar
  20. 20.
    Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue-resisue contact predictions in a sequence- and structure-rich era. PNAS 110(39), 15674–15679 (2013)CrossRefGoogle Scholar
  21. 21.
    Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)Google Scholar
  22. 22.
    Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)Google Scholar
  23. 23.
    Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M.: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004)CrossRefGoogle Scholar
  24. 24.
    Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)CrossRefGoogle Scholar
  25. 25.
    Meinhausen, N., Buhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)CrossRefGoogle Scholar
  27. 27.
    Pinto, A.A., Zilberman, D.: Modeling, Dynamics, Optimization and Bioeconomics I. Springer International Publishing, Cham (2014)zbMATHGoogle Scholar
  28. 28.
    Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)Google Scholar
  30. 30.
    Taylan, P., Weber, G.W., Yerlikaya Özkurt, F.: A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. Top 18(2), 377–395 (2010)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  32. 32.
    Tibshirani, R., Saunders, M.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. 67(1), 91–108 (2005)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)Google Scholar
  34. 34.
    Trivedi, K., Zimmer, D.: Copula modeling: an introduction for practitioners. Found. Trends Econ. 1(1), 1–111 (2005)zbMATHGoogle Scholar
  35. 35.
    Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)Google Scholar
  36. 36.
    Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)zbMATHGoogle Scholar
  37. 37.
    Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Chichester (2001)zbMATHGoogle Scholar
  38. 38.
    Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)Google Scholar
  39. 39.
    Witten, D.M., Tibshirani, R.: Covariance regularised regression and classification for high dimensional problems. J. R. Stat. Soc. 71(3), 615–636 (2009)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrica 94, 19–35 (2007)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Zhou, S.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12(4), 2975–3026 (2011)MathSciNetzbMATHGoogle Scholar
  42. 42.
    Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)MathSciNetCrossRefGoogle Scholar
  43. 43.
    Zou, H., Hastie T.: Regularisation and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Informatics InstituteMiddle East Technical UniversityAnkaraTurkey
  2. 2.Department of StatisticsMiddle East Technical UniversityAnkaraTurkey

Personalised recommendations