Abstract
This paper shows that a regression tree problem can be turned into a classification tree problem reducing the computational cost and providing useful interpretation issues. A TWO-CLASS tree methodoloy for non-parametric regression analysis is introduced. Data are as follows: a numerical response variable and a set of predictors (of categorical and/or numerical type) are measured on a sample of objects, with no probability assumption. Thus a non-parametric approach is proposed. The concepts of prospective and retrospective splits are considered. Main idea is to grow a binary partition of the sample of objects such that, at each node of the tree structure, the numerical response is recoded into a dummy or two-class variable (called theoretical response) on the basis of the optimal partition of the objects into two groups within the set of retrospective splits. A two-stage splitting criterion with a fast algorithm is applied: the best split of the objects is found in the set of candidate (prospective) splits of each predictor modalities by maximizing the predictability of the two-class response. Some applications on real world cases and a simulation study allow to demonstrate that the two-class splitting procedure is computationally less intensive than standard regression tree such as CART. Furthermore, the final partitions obtained by the two-class procedure and the standard one are very similar to each other, in terms of percentage of objects belonging together to the same terminal node. Some aids to the interpretation allow to describe the response variable distribution in the terminal nodes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, Irvine, CA, University of California, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html (2007)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees, Wadsworth, Belmont, CA (1984)
Hastie, T., Friedman, J.H., Tibshirani, R.: The Elements of Statistical Learning. Springer, New York, NY (2001)
D’Ambrosio, A., Aria, M., Siciliano, R.: Robust Tree-Based Incremental Imputation Method for Data Fusion. Advances in Intelligent Data Analysis, Springer, Berlin Heidelberg (2007)
Siciliano, R., Mola, F.: Ternary classification trees: a Factorial Approach. In: Greenacre, M., Blasius, J. (eds.) Visualization of Categorical Data, pp. 311–323, cap. 22. Academic Press, San Diego, CA (1998)
Mola, F., Siciliano, R.: A two-stage predictive splitting algorithm in binary segmentation. In: Dodge, Y., Whittaker, J. (eds.) COMPSTAT’92 Proceedings, pp. 179–184. Physica Verlag, Heidelberg (1992)
Mola, F., Siciliano, R.: A Fast Splitting Procedure for Classification and Regression Trees, Statistics and Computing, vol. 7, pp. 208–216. Chapman Hall, New York, NY (1997)
Mola, F., Siciliano, R.: Discriminant analysis and factorial multiple splits in recursive partitioning for data mining. In: Roli, F., Kittler, J. (eds.) Proceedings of International Conference on Multiple Classifier Systems, Lecture Notes in Computer Science, pp. 118–126. Springer, Heidelberg (2002)
Morgan, J.N., Sonquist, J.A: Problem in the analysis of survey data and a proposal. J. Am. Stat. Assoc. 58 (1963)
Siciliano, R.: Exploratory versus decision trees. In: Payne, R., Green, P. (eds.) COMPSTAT ’98 Proceedings, pp. 113–124. Physica-Verlag, Heidelberg (1998)
Siciliano, R., Mola, F.: A fast regression tree procedure. In: Forcina, A. et al. (eds.) Statistical Modeling. Proceedings of the 11th International Workshop on Statistical Modeling, pp. 332–340. Graphos, Perugia (1996)
Mola, F., Klaschka, J., Siciliano, R.: Logistic Classification Trees. In: A. Prat (ed.): Proceedings in Computational Statistics: COMPSTAT ’96 (Barcellona), pp. 373–378. Physica-Verlag, Heidelberg (D) (Aug 24–28, 1996)
Siciliano, R., Mola, F.: Multivariate Data Analysis through Classification and Regression Thees, Computational Statistics and Data Analysis, vol. 32, pp. 285–301. Elsevier Science, Amsterdam (2000)
Siciliano, R., Aria, M., Conversano, C.: Tree Harvest: methods, software and applications. In: Antoch, J. (ed.) COMPSTAT 2004 Proceedings, pp. 1807–1814. Springer, Berlin Heidelberg (2004)
Tutore, V.A., Siciliano, R., Aria,M.: Conditional Classification Trees using Instrumental Variables. Advances in Intelligent Data Analysis, pp. 163–173. Springer, Berlin Heidelberg (2007)
Acknowledgments
Financial support from European FP6 Project iWebCare IST-4-02-8055 (Scientific Responsible: Prof. Roberta Siciliano).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siciliano, R., Aria, M. (2011). TWO-CLASS Trees for Non-Parametric Regression Analysis. In: Fichet, B., Piccolo, D., Verde, R., Vichi, M. (eds) Classification and Multivariate Analysis for Complex Data Structures. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13312-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-13312-1_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13311-4
Online ISBN: 978-3-642-13312-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)