Skip to main content

TWO-CLASS Trees for Non-Parametric Regression Analysis

  • Conference paper
  • First Online:
Classification and Multivariate Analysis for Complex Data Structures
  • 2898 Accesses

Abstract

This paper shows that a regression tree problem can be turned into a classification tree problem reducing the computational cost and providing useful interpretation issues. A TWO-CLASS tree methodoloy for non-parametric regression analysis is introduced. Data are as follows: a numerical response variable and a set of predictors (of categorical and/or numerical type) are measured on a sample of objects, with no probability assumption. Thus a non-parametric approach is proposed. The concepts of prospective and retrospective splits are considered. Main idea is to grow a binary partition of the sample of objects such that, at each node of the tree structure, the numerical response is recoded into a dummy or two-class variable (called theoretical response) on the basis of the optimal partition of the objects into two groups within the set of retrospective splits. A two-stage splitting criterion with a fast algorithm is applied: the best split of the objects is found in the set of candidate (prospective) splits of each predictor modalities by maximizing the predictability of the two-class response. Some applications on real world cases and a simulation study allow to demonstrate that the two-class splitting procedure is computationally less intensive than standard regression tree such as CART. Furthermore, the final partitions obtained by the two-class procedure and the standard one are very similar to each other, in terms of percentage of objects belonging together to the same terminal node. Some aids to the interpretation allow to describe the response variable distribution in the terminal nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, Irvine, CA, University of California, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html (2007)

  2. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Wadsworth, Belmont, CA (1984)

    Google Scholar 

  3. Hastie, T., Friedman, J.H., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York, NY (2001)

    Google Scholar 

  4. D’Ambrosio, A., Aria, M., Siciliano, R.: Robust Tree-Based Incremental Imputation Method for Data Fusion. Advances in Intelligent Data Analysis, Springer, Berlin Heidelberg (2007)

    Google Scholar 

  5. Siciliano, R., Mola, F.: Ternary classification trees: a Factorial Approach. In: Greenacre, M., Blasius, J. (eds.) Visualization of Categorical Data, pp. 311–323, cap. 22. Academic Press, San Diego, CA (1998)

    Google Scholar 

  6. Mola, F., Siciliano, R.: A two-stage predictive splitting algorithm in binary segmentation. In: Dodge, Y., Whittaker, J. (eds.) COMPSTAT’92 Proceedings, pp. 179–184. Physica Verlag, Heidelberg (1992)

    Google Scholar 

  7. Mola, F., Siciliano, R.: A Fast Splitting Procedure for Classification and Regression Trees, Statistics and Computing, vol. 7, pp. 208–216. Chapman Hall, New York, NY (1997)

    Google Scholar 

  8. Mola, F., Siciliano, R.: Discriminant analysis and factorial multiple splits in recursive partitioning for data mining. In: Roli, F., Kittler, J. (eds.) Proceedings of International Conference on Multiple Classifier Systems, Lecture Notes in Computer Science, pp. 118–126. Springer, Heidelberg (2002)

    Google Scholar 

  9. Morgan, J.N., Sonquist, J.A: Problem in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58 (1963)

    Google Scholar 

  10. Siciliano, R.: Exploratory versus decision trees. In: Payne, R., Green, P. (eds.) COMPSTAT ’98 Proceedings, pp. 113–124. Physica-Verlag, Heidelberg (1998)

    Google Scholar 

  11. Siciliano, R., Mola, F.: A fast regression tree procedure. In: Forcina, A. et al. (eds.) Statistical Modeling. Proceedings of the 11th International Workshop on Statistical Modeling, pp. 332–340. Graphos, Perugia (1996)

    Google Scholar 

  12. Mola, F., Klaschka, J., Siciliano, R.: Logistic Classification Trees. In: A. Prat (ed.): Proceedings in Computational Statistics: COMPSTAT ’96 (Barcellona), pp. 373–378. Physica-Verlag, Heidelberg (D) (Aug 24–28, 1996)

    Google Scholar 

  13. Siciliano, R., Mola, F.: Multivariate Data Analysis through Classification and Regression Thees, Computational Statistics and Data Analysis, vol. 32, pp. 285–301. Elsevier Science, Amsterdam (2000)

    Google Scholar 

  14. Siciliano, R., Aria, M., Conversano, C.: Tree Harvest: methods, software and applications. In: Antoch, J. (ed.) COMPSTAT 2004 Proceedings, pp. 1807–1814. Springer, Berlin Heidelberg (2004)

    Google Scholar 

  15. Tutore, V.A., Siciliano, R., Aria,M.: Conditional Classification Trees using Instrumental Variables. Advances in Intelligent Data Analysis, pp. 163–173. Springer, Berlin Heidelberg (2007)

    Google Scholar 

Download references

Acknowledgments

Financial support from European FP6 Project iWebCare IST-4-02-8055 (Scientific Responsible: Prof. Roberta Siciliano).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberta Siciliano .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Siciliano, R., Aria, M. (2011). TWO-CLASS Trees for Non-Parametric Regression Analysis. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_6

Download citation

Publish with us

Policies and ethics