Skip to main content

Recursive Partitioning and Tree-based Methods

  • Chapter
  • First Online:
Handbook of Computational Statistics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

Abstract

Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. The applications of these methods are far reaching. They include financial firms (credit cards Altman 2002; Frydman et al. 2002 and investments Pace 1995; Brennan et al. 2001), manufacturing and marketing companies Levin et al. (1995), and pharmaceutical companies. In the past decades, there have been many applications in genomics and bioinformatics Zhang et al. (2001).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Altman, E.I.: Bankruptcy, Credit Risk and High Yield Junk Bonds. Blackwell Publishers, Malden, MA (2002)

    Google Scholar 

  • Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96, 6745–6750 (1999)

    Article  Google Scholar 

  • Bacchetti, P., Segal, M.R.: Survival trees with time-dependent covariates: application to estimating changes in the incubation period of AIDS. Lifetime Data Anal. 1, 35–47 (1995)

    Article  MATH  Google Scholar 

  • Bahl, L.R., Brown, P.F., de Sousa, P.V., Mercer R.L.: A tree-based language model for natural language speech recognition. IEEE Trans. AS and SP 37, 1001–1008 (1989)

    Article  Google Scholar 

  • Banerjee, M., Biswas, D., Sakr, W., Wood, D.P. Jr.: Recursive partitioning for prognostic grouping of patients with clinically localized prostate carcinoma. Cancer 89, 404–411 (2000)

    Article  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Wadsworth, Belmont, California (1984)

    Google Scholar 

  • Breiman, L.: Bagging predictors. Mach. Learn. 26, 123–140 (1994)

    Google Scholar 

  • Breiman, L.: Bagging predictors, Mach. Learn., 26, 123–140 (1996)

    Google Scholar 

  • Breiman, L.: Random Forests, Mach. Learn., 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  • Brennan, N., Parameswaran, P. et al.: A Method for Selecting Stocks within Sectors. Schroder Salomon Smith Barney (2001)

    Google Scholar 

  • Buhlmann, P., Yu, B.: Boosting with the L-2 loss: Regression and classification. J. Am. Stat. Assoc. 98, 324–339 (2003)

    Article  MathSciNet  Google Scholar 

  • Buhlmann, P., Yu, B.: Analyzing bagging. Ann. Stat. 30, 927–961 (2002)

    MathSciNet  Google Scholar 

  • Carmelli, D., Halpern, J., Swan, G.E., Dame, A., McElroy, M., Gelb, A.B., Rosenman, R.H.: 27-year mortality in the western collaborative group study: construction of risk groups by recursive partitioning. J. Clin. Epidemiol. 44, 1341–1351 (1991)

    Article  Google Scholar 

  • Carmelli, D., Zhang, H.P., Swan, G.E.: Obesity and 33 years of coronary heart disease and cancer mortality in the western collaborative group study. Epidemiology 8, 378–383 (1997)

    Article  Google Scholar 

  • Chen, X., Liu, CT., Zhang, M., Zhang, H.: A forest-based approach to identifying gene and gene gene interactions. Proc Natl Acad Sci USA, 104, 19199–19203 (2007)

    Google Scholar 

  • Chen, X., Rusinko, A., Young, S.S.: Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J. Chem. Inform. Comput. Sci. 38, 1054–1062 (1998)

    Article  Google Scholar 

  • Ciampi, A., Couturier, A., Li, S.L.: Prediction trees with soft nodes for binary outcomes. Stat. Med. 21, 1145–1165 (2002)

    Article  Google Scholar 

  • Ciampi, A., Hogg, S., McKinney, S., Thiffault, J.: A computer program for recursive partition and amalgamation for censored survival data. Comput Meth. Programs Biomed. 26, 239–256 (1988)

    Article  Google Scholar 

  • Ciampi, A., Thiffault, J., Nakache J.-P., Asselain, B.: Stratification by stepwise regression, correspondence analysis and recursive partition: A comparison of three methods of analysis for survival data with covariates. Comput. Stat. Data Anal. 4, 185–204 (1986)

    Article  MATH  Google Scholar 

  • Cox, D.R.: Regression models and life-tables (with discussion), Journal of the Royal Statistical Society, Series B, 34, 187–220 (1972)

    MATH  Google Scholar 

  • Cox, D.R.: The analysis of multivariate binary data. Appl. Stat. 21, 113–120 (1972)

    Article  Google Scholar 

  • Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods, Cambridge University Press, Cambridge (2000)

    Google Scholar 

  • Crowley, J., LeBlanc, M., Gentleman, R., Salmon, S.: Exploratory methods in survival analysis. In: Koul, H.L., Deshpande, J.V.: (eds.) IMS Lecture Notes – Monograph Series 27, pp. 55–77. IMS, Hayward, CA (1995)

    Google Scholar 

  • Crowley, J., LeBlanc, M., Jacobson, J., Salmon S.: Some exploratory methods for survival data. In: Lin, D.Y., Fleming, T.R. (eds.) Proceedings of the First Seattle Symposium in Biostatistics, Springer, New York (1997)

    Google Scholar 

  • Davis, R., Anderson, J.: Exponential survival trees. Stat. Med. 8, 947–962 (1989)

    Google Scholar 

  • Desilva, G.L., Hull, J.J.: Proper noun detection in document images. Pattern Recogn. 27, 311–320 (1994)

    Article  Google Scholar 

  • Diggle, P.J., Liang, K.Y., Zeger, S.L.: Analysis of Longitudinal Data, Oxford Science Publications, New York (1994)

    Google Scholar 

  • Donoho, D.L.: CART and best-ortho-basis: A connection. Ann. Stat. 25, 1870–1911 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)

    Google Scholar 

  • Fitzmaurice, G., Laird, N.M.: A likelihood-based method for analyzing longitudinal binary responses. Biometrika 80, 141–151 (1993)

    Article  MATH  Google Scholar 

  • Fox, S.H., Whalen, G.F., Sanders, M.M., Burleson, J.A., Jennings, K., Kurtzman, S., Kreutzer, D.: Angiogenesis in normal tissue adjacent to colon cancer. J. Surg. Oncol. 69, 230–234 (1998)

    Article  Google Scholar 

  • Friedman, J.H.: A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. C-26, 404–407 (1977)

    Article  Google Scholar 

  • Frydman, H., Altman, E.I., Kao, D.-I.: Introducing Recursive Partitioning for Financial Classification: The Case of Financial Distress. In: Altman ed. Bankruptcy, pp. 37–59. Credit Risk and High Yield Junk Bonds (2002)

    Google Scholar 

  • Geman, D., Jedynak, B.: An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18, 1–14 (1996)

    Article  Google Scholar 

  • Genuer, R., Poggi, J.M., Tuleau, C.: Random Forests: some methodological insights, Rapport de Recherche, Institut National de Recherche en Informatique et en Automatique (2008)

    Google Scholar 

  • Goldman, L., Cook, F., Johnson, P., Brand, D., Rouan, G., Lee, T.: Prediction of the need for intensive care in patients who come to emergency departments with acute chest pain. New Engl. J. Med. 334, 1498–504 (1996)

    Article  Google Scholar 

  • Goldman, L., Weinberg, M., Olshen, R.A., Cook, F., Sargent, R. et al.: A computer protocol to predict myocardial infarction in emergency department patients with chest pain. New Engl. J. Med. 307, 588–597 (1982)

    Article  Google Scholar 

  • Gordon, L., Olshen, R.A.: Asymptotically efficient solutions to the classification problem. Ann. Stat. 6, 515–533 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, L., Olshen, R.A.: Consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 10, 611–627 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, L., Olshen, R.A.: Almost surely consistent nonparametric regression from recursive partitioning schemes. J. Multivariate Anal. 15, 147–163 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, L., Olshen, R.A.: Tree-structured survival analysis. Canc. Treat. Rep. 69, 1065–1069 (1985)

    Google Scholar 

  • Huang, X., Chen, S.D., Soong, S.J.: Piecewise exponential survival trees with time-dependent covariates. Biometrics 54, 1420–14333 (1998)

    Article  MATH  Google Scholar 

  • Inoue, K., Slaton, J.W., Karashima, T., Shuin, T., Sweeney, P., Millikan, R., Dinney, C.P.: The prognostic value of angiogenesis factor expression for predicting recurrence and metastasis of bladder cancer after neoadjuvant chemotherapy and radical cystectomy. Clin. Canc. Res. 6, 4866–4873 (2000)

    Google Scholar 

  • Intrator, O., Kooperberg, C.: Trees and splines in survival analysis. Stat. Meth. Med. Res. 4, 237–262 (1995)

    Article  Google Scholar 

  • Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S.: Random Survival Forests, the Annals of Applied Statistics, 2, 841–860 (2008)

    Google Scholar 

  • Kullback, S., Leibler, R.A.: On information and sufficiency, The Annals of Mathematical Statistics, 22, 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  • Kwak, L.W., Halpern, J., Olshen, R.A., Horning, S.J.: Prognostic significance of actual dose intensity in diffuse large-cell lymphoma: results of a tree-structured survival analysis. J. Clin. Oncol. 8, 963–977 (1990)

    Google Scholar 

  • LeBlanc, M., Crowley, J.: Relative risk trees for censored survival data. Biometrics 48, 411–425 (1992)

    Article  Google Scholar 

  • LeBlanc, M., Crowley, J.: Survival trees by goodness-of-split. J. Am. Stat. Assoc. 88, 457–467 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • LeBlanc, M., Crowley, J.: A review of tree-based prognostic models. In: Thall, P.F. (eds/) Recent Advances in Clinical Trial Design and Analysis, pp. 113–124. Kluwer, New York (1995)

    Google Scholar 

  • Levin, N., Zahavi, J., Olitsky, M.: Amos – A probability-driven, customer-oriented decision support system for target marketing of solo mailings. Eur. J. Oper. Res. 87, 708–721 (1995)

    Article  MATH  Google Scholar 

  • Lin, S., Cutler, D.J., Zwick, M.E., Chakravarti, A.: Haplotype inference in random population samples, American Journal of Human Genetics, 71, 1129–1137 (2002)

    Article  Google Scholar 

  • Long, W.L., Griffith, J.L., Selker, H.P., D’Agostino, R.B.: A comparison of logistic regression to decision tree induction in a medical domain. Comput. Biomed. Res. 26, 74–97 (1993)

    Article  Google Scholar 

  • Lugosi, G., Nobel, A.B.: Consistency of data-driven histogram methods for density estimation and classification. Ann. Stat. 24, 687–706 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Miller, R.G.: Survival Analysis, Wiley, New York (1981)

    MATH  Google Scholar 

  • Morgan, J.N., Sonquist, J.A.: Problems in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58, 415–434 (1963)

    Article  MATH  Google Scholar 

  • Nagata, K., Okano, Y., Nozawa, Y.: Differential expression of low Mr GTP-binding proteins in human megakaryoblastic leukemia cell line, MEG-01 and their possible involvement in the differentiation process. Thromb. Haemostasis 77, 368–375 (1997)

    Google Scholar 

  • Nobel, A.B.: Histogram regression estimation using data-dependent partitions. Ann. Stat. 24, 1084–1105 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Nobel, A.B., Olshen, R.A.: Termination and continuity of greedy growing for tree structured vector quantizers. IEEE Trans. Inform. Theor. 42, 191–206 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Owens, E.A., Griffiths, R.E., Ratnatunga, K.U.: Using oblique decision trees for the morphological classification of galaxies. Mon. Not. Roy. Astron. Soc. 281, 153–157 (1996)

    Google Scholar 

  • Pace, R.K.: Parametric, semiparametric and nonparametric estimation of characteristic values within mass assessment and hedonic pricing models. J. R. Estate Finance Econ. 11, 195–217 (1995)

    Article  Google Scholar 

  • Quinlan, J.R.: Unknown attribute values in induction. In: Proceedings of the Sixth International Machine Learning Workshop, Morgan Kaufmann, Cornell, New York (1989)

    Google Scholar 

  • Segal, M.R.: Regression trees for censored data. Biometrics 44, 35–48 (1988)

    MATH  Google Scholar 

  • Segal, M.R.: Tree-structured methods for longitudinal data. J. Am. Stat. Assoc. 87, 407–418 (1992)

    Article  Google Scholar 

  • Segal, M.R.: Extending the elements of tree-structured regression. Stat. Meth. Med. Res. 4, 219–236 (1995)

    Article  Google Scholar 

  • Segal, M.R., Bloch, D.A.: A comparison of estimated proportional hazards models and regression trees. Stat. Med. 8, 539–550 (1989)

    Article  Google Scholar 

  • Selker, H.P., Griffith, J.L., Patil, S., Long, W.L., D’Agostino, R.B.: A comparison of performance of mathematical predictive methods for medical diagnosis: Identifying acute cardiac ischemia among emergency department patients. J. Investig. Med. 43, 468–476 (1995)

    Google Scholar 

  • Therneau, T.M., Grambsch, P.M., Fleming, T.R.: Martingale-based residuals for survival models. Biometrika 77, 147–160 (1990)

    MathSciNet  MATH  Google Scholar 

  • Toshina, K., Hirata, I., Maemura, K., Sasaki, S., Murano, M., Nitta, M., Yamauchi, H., Nishikawa, T., Hamamoto, N., Katsu, K.: Enprostil, a prostaglandin-E-2 analogue, inhibits interleukin-8 production of human colonic epithelial cell lines. Scand. J. Immunol. 52, 570–575 (2000)

    Article  Google Scholar 

  • Wang, M., Chen, X., Zhang, H.: Maximal conditional chi-square importance in random forests. Bioinformatics 26, 831–837 (2010)

    Article  MathSciNet  Google Scholar 

  • Wasson, J.H., Sox, H.C., Neff, R.K., Goldman, L.: Clinical prediction rules: Applications and methodologic standards. New Engl. J. Med. 313, 793–799 (1985)

    Article  Google Scholar 

  • Yeates, L.C., Powis, G.: The expression of the molecular chaperone calnexin is decreased in cancer cells grown as colonies compared to monolayer. Biochem. Biophys. Res. Comm. 238, 66–70 (1997)

    Article  Google Scholar 

  • Zhang, H.P.: Splitting criteria in survival trees. In: Statistical Modelling: Proceedings of the 10th International Workshop on Statistical Modeling, pp. 305–314. Springer (1995)

    Google Scholar 

  • Zhang, H.P.: Classification trees for multiple binary responses. J. Am. Stat. Assoc. 93, 180–193 (1998)

    Article  MATH  Google Scholar 

  • Zhang, H.P., Bracken, M.B.: Tree-based risk factor analysis of preterm delivery and small-for-gestational-age birth. Am. J. Epidemiol. 141, 70–78 (1995)

    Google Scholar 

  • Zhang, H.P., Bracken, M.B.: Tree-based, two-stage risk factor analysis for spontaneous abortion. Am. J. Epidemiol. 144, 989–996 (1996)

    Article  Google Scholar 

  • Zhang, H.P., Crowley, J., Sox, H., Olshen, R.A.: Tree structural statistical methods. Encyclopedia of Biostatistics, 6: pp. 4561–4573. Wiley, Chichester, England (2001)

    Google Scholar 

  • Zhang, H.P., Holford, T., Bracken, M.B.: A tree-based methods of analysis for prospective studies. Stat. Med. 15, 37–49 (1996)

    Article  Google Scholar 

  • Zhang, H.P., Singer, B.: Recursive Partitioning and Its Applications. Springer, New York (2010)

    Google Scholar 

  • Zhang, H.P., Yu, C.Y., Singer, B.: Cell and tumor classification using gene expression data: Construction of forests. Proc. Natl. Acad. Sci. 100, 4168–4172 (2003)

    Article  Google Scholar 

  • Zhang, H.P., Yu, C.Y., Singer, B., Xiong, M.M.: Recursive partitioning for tumor classification with gene expression microarray data. Proc. Natl. Acad. Sci. 98, 6730–6735 (2001)

    Article  Google Scholar 

  • Zhao, L.P., Prentice, R.L.: Correlated binary regression using a quadratic exponential model. Biometrika 77, 642–648 (1990)

    Article  MathSciNet  Google Scholar 

  • Zhang, H., Wang, M.: Search for the smallest random forest. Statistics and its Interface 2, 381–388 (2009)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research is supported in part by grant R01DA016750 from the National Institutes on Drug Abuse.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heping Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zhang, H. (2012). Recursive Partitioning and Tree-based Methods. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_29

Download citation

Publish with us

Policies and ethics