Abstract
Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. The applications of these methods are far reaching. They include financial firms (credit cards Altman 2002; Frydman et al. 2002 and investments Pace 1995; Brennan et al. 2001), manufacturing and marketing companies Levin et al. (1995), and pharmaceutical companies. In the past decades, there have been many applications in genomics and bioinformatics Zhang et al. (2001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altman, E.I.: Bankruptcy, Credit Risk and High Yield Junk Bonds. Blackwell Publishers, Malden, MA (2002)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)
Bacchetti, P., Segal, M.R.: Survival trees with time-dependent covariates: application to estimating changes in the incubation period of AIDS. Lifetime Data Anal. 1, 35–47 (1995)
Bahl, L.R., Brown, P.F., de Sousa, P.V., Mercer R.L.: A tree-based language model for natural language speech recognition. IEEE Trans. AS and SP 37, 1001–1008 (1989)
Banerjee, M., Biswas, D., Sakr, W., Wood, D.P. Jr.: Recursive partitioning for prognostic grouping of patients with clinically localized prostate carcinoma. Cancer 89, 404–411 (2000)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Wadsworth, Belmont, California (1984)
Breiman, L.: Bagging predictors. Mach. Learn. 26, 123–140 (1994)
Breiman, L.: Bagging predictors, Mach. Learn., 26, 123–140 (1996)
Breiman, L.: Random Forests, Mach. Learn., 45, 5–32 (2001)
Brennan, N., Parameswaran, P. et al.: AÂ Method for Selecting Stocks within Sectors. Schroder Salomon Smith Barney (2001)
Buhlmann, P., Yu, B.: Boosting with the L-2 loss: Regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)
Buhlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30, 927–961 (2002)
Carmelli, D., Halpern, J., Swan, G.E., Dame, A., McElroy, M., Gelb, A.B., Rosenman, R.H.: 27-year mortality in the western collaborative group study: construction of risk groups by recursive partitioning. J. Clin. Epidemiol. 44, 1341–1351 (1991)
Carmelli, D., Zhang, H.P., Swan, G.E.: Obesity and 33 years of coronary heart disease and cancer mortality in the western collaborative group study. Epidemiology 8, 378–383 (1997)
Chen, X., Liu, CT., Zhang, M., Zhang, H.: A forest-based approach to identifying gene and gene gene interactions. Proc Natl Acad Sci USA, 104, 19199–19203 (2007)
Chen, X., Rusinko, A., Young, S.S.: Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J. Chem. Inform. Comput. Sci. 38, 1054–1062 (1998)
Ciampi, A., Couturier, A., Li, S.L.: Prediction trees with soft nodes for binary outcomes. Stat. Med. 21, 1145–1165 (2002)
Ciampi, A., Hogg, S., McKinney, S., Thiffault, J.: A computer program for recursive partition and amalgamation for censored survival data. Comput Meth. Programs Biomed. 26, 239–256 (1988)
Ciampi, A., Thiffault, J., Nakache J.-P., Asselain, B.: Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates. Comput. Stat. Data Anal. 4, 185–204 (1986)
Cox, D.R.: Regression models and life-tables (with discussion), Journal of the Royal Statistical Society, Series B, 34, 187–220 (1972)
Cox, D.R.: The analysis of multivariate binary data. Appl. Stat. 21, 113–120 (1972)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge (2000)
Crowley, J., LeBlanc, M., Gentleman, R., Salmon, S.: Exploratory methods in survival analysis. In: Koul, H.L., Deshpande, J.V.: (eds.) IMS Lecture Notes – Monograph Series 27, pp. 55–77. IMS, Hayward, CA (1995)
Crowley, J., LeBlanc, M., Jacobson, J., Salmon S.: Some exploratory methods for survival data. In: Lin, D.Y., Fleming, T.R. (eds.) Proceedings of the First Seattle Symposium in Biostatistics, Springer, New York (1997)
Davis, R., Anderson, J.: Exponential survival trees. Stat. Med. 8, 947–962 (1989)
Desilva, G.L., Hull, J.J.: Proper noun detection in document images. Pattern Recogn. 27, 311–320 (1994)
Diggle, P.J., Liang, K.Y., Zeger, S.L.: Analysis of Longitudinal Data, Oxford Science Publications, New York (1994)
Donoho, D.L.: CART and best-ortho-basis: A connection. Ann. Stat. 25, 1870–1911 (1997)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Fitzmaurice, G., Laird, N.M.: A likelihood-based method for analyzing longitudinal binary responses. Biometrika 80, 141–151 (1993)
Fox, S.H., Whalen, G.F., Sanders, M.M., Burleson, J.A., Jennings, K., Kurtzman, S., Kreutzer, D.: Angiogenesis in normal tissue adjacent to colon cancer. J. Surg. Oncol. 69, 230–234 (1998)
Friedman, J.H.: A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. C-26, 404–407 (1977)
Frydman, H., Altman, E.I., Kao, D.-I.: Introducing Recursive Partitioning for Financial Classification: The Case of Financial Distress. In: Altman ed. Bankruptcy, pp. 37–59. Credit Risk and High Yield Junk Bonds (2002)
Geman, D., Jedynak, B.: An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18, 1–14 (1996)
Genuer, R., Poggi, J.M., Tuleau, C.: Random Forests: some methodological insights, Rapport de Recherche, Institut National de Recherche en Informatique et en Automatique (2008)
Goldman, L., Cook, F., Johnson, P., Brand, D., Rouan, G., Lee, T.: Prediction of the need for intensive care in patients who come to emergency departments with acute chest pain. New Engl. J. Med. 334, 1498–504 (1996)
Goldman, L., Weinberg, M., Olshen, R.A., Cook, F., Sargent, R. et al.: A computer protocol to predict myocardial infarction in emergency department patients with chest pain. New Engl. J. Med. 307, 588–597 (1982)
Gordon, L., Olshen, R.A.: Asymptotically efficient solutions to the classification problem. Ann. Stat. 6, 515–533 (1978)
Gordon, L., Olshen, R.A.: Consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 10, 611–627 (1980)
Gordon, L., Olshen, R.A.: Almost surely consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 15, 147–163 (1984)
Gordon, L., Olshen, R.A.: Tree-structured survival analysis. Canc. Treat. Rep. 69, 1065–1069 (1985)
Huang, X., Chen, S.D., Soong, S.J.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54, 1420–14333 (1998)
Inoue, K., Slaton, J.W., Karashima, T., Shuin, T., Sweeney, P., Millikan, R., Dinney, C.P.: The prognostic value of angiogenesis factor expression for predicting recurrence and metastasis of bladder cancer after neoadjuvant chemotherapy and radical cystectomy. Clin. Canc. Res. 6, 4866–4873 (2000)
Intrator, O., Kooperberg, C.: Trees and splines in survival analysis. Stat. Meth. Med. Res. 4, 237–262 (1995)
Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random Survival Forests, the Annals of Applied Statistics, 2, 841–860 (2008)
Kullback, S., Leibler, R.A.: On information and sufficiency, The Annals of Mathematical Statistics, 22, 79–86 (1951)
Kwak, L.W., Halpern, J., Olshen, R.A., Horning, S.J.: Prognostic significance of actual dose intensity in diffuse large-cell lymphoma: results of a tree-structured survival analysis. J. Clin. Oncol. 8, 963–977 (1990)
LeBlanc, M., Crowley, J.: Relative risk trees for censored survival data. Biometrics 48, 411–425 (1992)
LeBlanc, M., Crowley, J.: Survival trees by goodness-of-split. J. Am. Stat. Assoc. 88, 457–467 (1993)
LeBlanc, M., Crowley, J.: A review of tree-based prognostic models. In: Thall, P.F. (eds/) Recent Advances in Clinical Trial Design and Analysis, pp. 113–124. Kluwer, New York (1995)
Levin, N., Zahavi, J., Olitsky, M.: Amos – A probability-driven, customer-oriented decision support system for target marketing of solo mailings. Eur. J. Oper. Res. 87, 708–721 (1995)
Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples, American Journal of Human Genetics, 71, 1129–1137 (2002)
Long, W.L., Griffith, J.L., Selker, H.P., D’Agostino, R.B.: A comparison of logistic regression to decision tree induction in a medical domain. Comput. Biomed. Res. 26, 74–97 (1993)
Lugosi, G., Nobel, A.B.: Consistency of data-driven histogram methods for density estimation and classification. Ann. Stat. 24, 687–706 (1996)
Miller, R.G.: Survival Analysis, Wiley, New York (1981)
Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58, 415–434 (1963)
Nagata, K., Okano, Y., Nozawa, Y.: Differential expression of low Mr GTP-binding proteins in human megakaryoblastic leukemia cell line, MEG-01 and their possible involvement in the differentiation process. Thromb. Haemostasis 77, 368–375 (1997)
Nobel, A.B.: Histogram regression estimation using data-dependent partitions. Ann. Stat. 24, 1084–1105 (1996)
Nobel, A.B., Olshen, R.A.: Termination and continuity of greedy growing for tree structured vector quantizers. IEEE Trans. Inform. Theor. 42, 191–206 (1996)
Owens, E.A., Griffiths, R.E., Ratnatunga, K.U.: Using oblique decision trees for the morphological classification of galaxies. Mon. Not. Roy. Astron. Soc. 281, 153–157 (1996)
Pace, R.K.: Parametric, semiparametric and nonparametric estimation of characteristic values within mass assessment and hedonic pricing models. J. R. Estate Finance Econ. 11, 195–217 (1995)
Quinlan, J.R.: Unknown attribute values in induction. In: Proceedings of the Sixth International Machine Learning Workshop, Morgan Kaufmann, Cornell, New York (1989)
Segal, M.R.: Regression trees for censored data. Biometrics 44, 35–48 (1988)
Segal, M.R.: Tree-structured methods for longitudinal data. J. Am. Stat. Assoc. 87, 407–418 (1992)
Segal, M.R.: Extending the elements of tree-structured regression. Stat. Meth. Med. Res. 4, 219–236 (1995)
Segal, M.R., Bloch, D.A.: A comparison of estimated proportional hazards models and regression trees. Stat. Med. 8, 539–550 (1989)
Selker, H.P., Griffith, J.L., Patil, S., Long, W.L., D’Agostino, R.B.: A comparison of performance of mathematical predictive methods for medical diagnosis: Identifying acute cardiac ischemia among emergency department patients. J. Investig. Med. 43, 468–476 (1995)
Therneau, T.M., Grambsch, P.M., Fleming, T.R.: Martingale-based residuals for survival models. Biometrika 77, 147–160 (1990)
Toshina, K., Hirata, I., Maemura, K., Sasaki, S., Murano, M., Nitta, M., Yamauchi, H., Nishikawa, T., Hamamoto, N., Katsu, K.: Enprostil, a prostaglandin-E-2 analogue, inhibits interleukin-8 production of human colonic epithelial cell lines. Scand. J. Immunol. 52, 570–575 (2000)
Wang, M., Chen, X., Zhang, H.: Maximal conditional chi-square importance in random forests. Bioinformatics 26, 831–837 (2010)
Wasson, J.H., Sox, H.C., Neff, R.K., Goldman, L.: Clinical prediction rules: Applications and methodologic standards. New Engl. J. Med. 313, 793–799 (1985)
Yeates, L.C., Powis, G.: The expression of the molecular chaperone calnexin is decreased in cancer cells grown as colonies compared to monolayer. Biochem. Biophys. Res. Comm. 238, 66–70 (1997)
Zhang, H.P.: Splitting criteria in survival trees. In: Statistical Modelling: Proceedings of the 10th International Workshop on Statistical Modeling, pp. 305–314. Springer (1995)
Zhang, H.P.: Classification trees for multiple binary responses. J. Am. Stat. Assoc. 93, 180–193 (1998)
Zhang, H.P., Bracken, M.B.: Tree-based risk factor analysis of preterm delivery and small-for-gestational-age birth. Am. J. Epidemiol. 141, 70–78 (1995)
Zhang, H.P., Bracken, M.B.: Tree-based, two-stage risk factor analysis for spontaneous abortion. Am. J. Epidemiol. 144, 989–996 (1996)
Zhang, H.P., Crowley, J., Sox, H., Olshen, R.A.: Tree structural statistical methods. Encyclopedia of Biostatistics, 6: pp. 4561–4573. Wiley, Chichester, England (2001)
Zhang, H.P., Holford, T., Bracken, M.B.: A tree-based methods of analysis for prospective studies. Stat. Med. 15, 37–49 (1996)
Zhang, H.P., Singer, B.: Recursive Partitioning and Its Applications. Springer, New York (2010)
Zhang, H.P., Yu, C.Y., Singer, B.: Cell and tumor classification using gene expression data: Construction of forests. Proc. Natl. Acad. Sci. 100, 4168–4172 (2003)
Zhang, H.P., Yu, C.Y., Singer, B., Xiong, M.M.: Recursive partitioning for tumor classification with gene expression microarray data. Proc. Natl. Acad. Sci. 98, 6730–6735 (2001)
Zhao, L.P., Prentice, R.L.: Correlated binary regression using a quadratic exponential model. Biometrika 77, 642–648 (1990)
Zhang, H., Wang, M.: Search for the smallest random forest. Statistics and its Interface 2, 381–388 (2009)
Acknowledgements
This research is supported in part by grant R01DA016750 from the National Institutes on Drug Abuse.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Zhang, H. (2012). Recursive Partitioning and Tree-based Methods. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-21551-3_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21550-6
Online ISBN: 978-3-642-21551-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)