Modeling of Biochemical Networks via Classification and Regression Tree Methods
- 504 Downloads
In the description of biological networks, a number of modeling approaches has been suggested based on different assumptions. The major problems in these models and their associated inference approaches are the complexity of biological systems, resulting in high number of model parameters, few observations from each variable in the system, their sparse structures, and high correlation between model parameters. From recent studies, it has been seen that the nonparametric methods can ameliorate these challenges and be one of the strong alternative approaches. Furthermore, it has been observed that not only the regression type of nonparametric models but also nonparametric clustering methods whose calculations are adapted to the biochemical systems can be another promising choice. Hereby, in this study, we propose the classification and regression tree (CART) method as a new approach in the construction of the complex systems when the system’s activity is described under its steady-state condition. Basically, CART is a classification technique for highly correlated data and can be represented as the nonparametric version of the generalized additive model. In this work, we use CART in the construction of biological modules and then networks. We analyze the performance of CART comprehensively under various Monte Carlo scenarios such as different data distributions and dimensions. We compare our results with the outputs of the Gaussian graphical model (GGM) which is the most well-known model under the given condition of the system. In our study, we also evaluate the performance of CART with the GGM findings by using real systems. For this purpose, we choose the pathways which have a crucial role on the cervical cancer. In the analyses, we consider this particular illness since it is the second most common cancer type in women both in Turkey and in the world after the breast cancer, and there is only a limited information for the description of this complex system disease.
KeywordsClassification And Regression Tree (CART) Gaussian Graphical Models (GGM) Twoing Rule Gini Rule Split Question
The authors thank the BAP project (no: BAP-01-09-2016-002) and DAP project (no: BAP-08-11-2017-035) at the Middle East Technical University for their support.
- 1.Ayyıldız, E.: Gaussian Graphical Approaches in Estimation of Biological Systems. Department of Statistics, Middle East Technical University, Ankara (2013)Google Scholar
- 4.Bower, J.M., Bolouri, H.: Computational Modeling of Genetic and Biochemical Networks. MIT, Cambridge (2001)Google Scholar
- 6.Bozdoğan, H.: ICOMP: a new model selection criterion. In: Bock, H.H. (ed.) Classification and Related Methods of Data Analysis, pp. 599–608. North-Holland, Amsterdam (1988)Google Scholar
- 14.Friedman, J., Hastie, T., Tibshirani, R.: Glasso: graphical lasso-estimation of Gaussian graphical models R package Manual, CRAN, 1–6 (2014)Google Scholar
- 21.Lewis, R.J.: An introduction to classification and regression tree (CART) analysis. In: Annual Meeting of the Society of Academic Emergency Medicine (2000)Google Scholar
- 22.Liaw, A., Wiener, M.: Classification and regression by random forest. R News. 2(3), 18–22 (2002)Google Scholar
- 24.Maiwald, T., Schneider, A., Busch, H., Sahle, S., Gretz, N., Weiss, T.S., Kummer, U., Klingüller, U.: Combining theoretical analysis and experimental data generation reveals IRF9 as a crucial factor for accelerating interferon a-induced early antiviral signalling. FEBS J. 277, 4741–4754 (2010)CrossRefGoogle Scholar
- 29.Seçilmiş, D., Purutçuoğlu, V.: Nonparametric versus parametric models in inference of protein-protein interaction networks. In: International Conference on Advances in Science and Arts Istanbul, pp. 55–61 (2017)Google Scholar
- 33.Timofeev, R.: Classification and Regression Trees (CART) Theory and Applications. Center of Applied Statistics and Economics, Humboldt University, Berlin (2004)Google Scholar
- 35.Wawrzyniak, M.M., Kurowicka, D.: Dependence Concepts. Delft University of Technology, Delft Institute of Applied Mathematics, Delft (2006)Google Scholar
- 38.Wit, E., Vinciotti, V., Purutçuoğlu, V.: Statistics for biological networks: short course notes. In: 25th International Biometric Conference (IBC), Florianopolis (2010)Google Scholar