Mathematical Methods in Engineering pp 87-102 | Cite as
Modeling of Biochemical Networks via Classification and Regression Tree Methods
- 504 Downloads
Abstract
In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.
Keywords
Classification And Regression Tree (CART) Gaussian Graphical Models (GGM) Twoing Rule Gini Rule Split QuestionNotes
Acknowledgements
The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.
References
- 1.Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)Google Scholar
- 2.Ayyıldız, E., Ağraz, M., Purutçuoğlu, V.: MARS as an alternative approach of Gaussian graphical model for biochemical networks. J. Appl. Stat. 44(16), 2858–2876 (2017)MathSciNetCrossRefGoogle Scholar
- 3.Barabasi, A.L., Oltvai, Z.N.: Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 2101–2113 (2004)CrossRefGoogle Scholar
- 4.Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)Google Scholar
- 5.Bozdoğan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)MathSciNetCrossRefGoogle Scholar
- 6.Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)Google Scholar
- 7.Breiman, L.: Bagging predictors. Mach. Learn. 2(24), 123–140 (1996)zbMATHGoogle Scholar
- 8.Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)CrossRefGoogle Scholar
- 9.Dobra, A., Lenkoski, A.: Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5(2A), 969–993 (2011)MathSciNetCrossRefGoogle Scholar
- 10.Everett, B., Dunn G.: Applied Multivariate Data Analysis. Arnold Press, London (2001)CrossRefGoogle Scholar
- 11.Friedman, J.H.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)MathSciNetCrossRefGoogle Scholar
- 12.Friedman, J., Hastie, T., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1(2), 302–332 (2007)MathSciNetCrossRefGoogle Scholar
- 13.Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441 (2008)CrossRefGoogle Scholar
- 14.Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)Google Scholar
- 15.Genest, C., Favre, A.C.: Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydroelectric Eng. 12(4), 347–368 (2007)CrossRefGoogle Scholar
- 16.Gillespie, D.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)CrossRefGoogle Scholar
- 17.Hastie, T.: The Elements of Statistical Learning. Springer, New York (2001)CrossRefGoogle Scholar
- 18.Johnson, R., Wichern, D.: Applied Multivariate Statistical Analysis. Pearson Prentice Hall, Upper Saddle River (2002)zbMATHGoogle Scholar
- 19.Jones, D.T., Buchan, D.W.A., Cozzetto, D., Pontil, M.: PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large sequence alignments. Bioinformatics 28(2), 184–190 (2012)CrossRefGoogle Scholar
- 20.Kamisetty, H., Ovchinnikov, S., Baker, D.: Assessing the utility of coevolution-based residue-resisue contact predictions in a sequence- and structure-rich era. PNAS 110(39), 15674–15679 (2013)CrossRefGoogle Scholar
- 21.Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)Google Scholar
- 22.Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)Google Scholar
- 23.Luscombe, N.M., Babu, M.M., Yu, H., Snyder, M., Teichmann, S.A., Gerstein, M.: Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431, 308–312 (2004)CrossRefGoogle Scholar
- 24.Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)CrossRefGoogle Scholar
- 25.Meinhausen, N., Buhlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34, 1436–1462 (2006)MathSciNetCrossRefGoogle Scholar
- 26.Nelsen, R.B.: An Introduction to Copulas. Springer, New York (1999)CrossRefGoogle Scholar
- 27.Pinto, A.A., Zilberman, D.: Modeling, Dynamics, Optimization and Bioeconomics I. Springer International Publishing, Cham (2014)zbMATHGoogle Scholar
- 28.Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)MathSciNetCrossRefGoogle Scholar
- 29.Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)Google Scholar
- 30.Taylan, P., Weber, G.W., Yerlikaya Özkurt, F.: A new approach to multivariate adaptive regression splines by using Tikhonov regularization and continuous optimization. Top 18(2), 377–395 (2010)MathSciNetCrossRefGoogle Scholar
- 31.Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)MathSciNetzbMATHGoogle Scholar
- 32.Tibshirani, R., Saunders, M.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. 67(1), 91–108 (2005)MathSciNetCrossRefGoogle Scholar
- 33.Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)Google Scholar
- 34.Trivedi, K., Zimmer, D.: Copula modeling: an introduction for practitioners. Found. Trends Econ. 1(1), 1–111 (2005)zbMATHGoogle Scholar
- 35.Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)Google Scholar
- 36.Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, New York (1990)zbMATHGoogle Scholar
- 37.Whittaker, J.: Graphical Models in Applied Multivariate Statistics. Wiley, Chichester (2001)zbMATHGoogle Scholar
- 38.Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)Google Scholar
- 39.Witten, D.M., Tibshirani, R.: Covariance regularised regression and classification for high dimensional problems. J. R. Stat. Soc. 71(3), 615–636 (2009)MathSciNetCrossRefGoogle Scholar
- 40.Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrica 94, 19–35 (2007)MathSciNetCrossRefGoogle Scholar
- 41.Zhou, S.: High-dimensional covariance estimation based on Gaussian graphical models. J. Mach. Learn. Res. 12(4), 2975–3026 (2011)MathSciNetzbMATHGoogle Scholar
- 42.Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)MathSciNetCrossRefGoogle Scholar
- 43.Zou, H., Hastie T.: Regularisation and variable selection via the elastic net. J. R. Stat. Soc. 67(2), 301–320 (2005)MathSciNetCrossRefGoogle Scholar