Abstract
RECPAM is a method for constructing regression trees from data which permits explicit treatment of a great variety of response variables. It is shown that it is possible to use the RECPAM methodology to find solutions to two basic problems of data analysis: discovering and identifying classes in data (classification), and finding simple and economical rules to assign individuals to classes (discrimination). The RECPAM based approach to classification finds classes which are economically described in terms of some of the variables and such that the joint distribution of all variables is, at least approximately, homogeneous within classes and distinct across classes. RECPAM based discrimination simultaneously treats categorical and continuous variables, while respecting their distinct nature: it builds discrimination models which can be seen as a generalization of tree-based models and multi-category logistic regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716 - 723.
Breiman, L. (1991). The II-method for Estimating Multivariate Functions from Noisy Data. Technometrics, 33, 125 - 160.
Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.
Buntine, W. (1992). Learning Classification Trees. Statistics and Computing, 2, 63 - 73.
Chou, P. (1991). Optimal Partitioning for Classification and Regression Trees. IEEE Transactions of Pattern Analysis, bf 13, 340 - 354.
Chou, P., Lookabough, T., and Gray, R.M. (1989). Optimal Pruning with Applications to Tree-structured Source Coding and Modeling. IEEE Transactions of Information Theory, 35, 299 - 315.
Ciampi, A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57 - 78.
Ciampi, A. (1992). Constructing Prediction Trees from Data: the RECPAM Approach. Proceedings from the Prague 1991 Summer School on Computational Aspects of Model Choice, 105 - 152. Physica-Verlag, Heidelberg.
Ciampi, A., Bush, R.S., Gospodarowicz, M. and Till, J.E. (1981). An Approach to Classifying Prognostic Factors Related to Survival Experience for Non-Hodgkins Lymphoma Patients. Cancer, 47, 621 - 627.
Ciampi, A., Chang, C.-H., Hogg, S.A., McKinney, S. (1987). Recursive Partition: a Versatile Method for Exploratory Data Analysis in Biostatistics. In I. MacNeil, G.J. Umphrey (eds.), Festschrifts in Honor of Prof. Joshi, 5, Biostatistics, 23 - 50. D. Reidel, Dordrecht.
Ciampi, A., Hendricks, L. and Lou, Z. (1992). Tree-growing for the Multivariate Model: the RECPAM Approach. In Y. Dodge, J. Whittaker (eds.), Computational Statistics, 1, 131 - 136. Physica-Verlag, Berlin.
Ciampi, A., Hendricks, L. and Lou, Z. (1993). Discriminant Analysis for Mixed Variables: Integrating Trees and Regression Models. In C.M. Cuadras, C.R. Rao (eds.), Multivariate Analysis: Future Directions, 2, 3 - 22. North-Holland, Amsterdam.
Ciampi, A., Lou, Z., Lin, Q. and Negassa, A. (1991). Recursive Partition and Amalgamation with the Exponential Family: Theory and Applications. Applied Stochastic Models and Data Analysis, 7, 121 - 137.
Ciampi, A., Schiffrin, A., Thiffault, J., Quintal, H., Weitzner, G., Poussier, P. and Lalla, D. (1990). Cluster Analysis of an Insuline-dependent Diabetic Cohort: Towards the Definition of Clinical Subtypes. Journal of Clinical Epidemiology, 43, 701 - 715.
Crawford, S. (1989). Extensions to the CART Algorithm. International Journal of Machine-Man Studies, 31, 197 - 217.
Dawid, A.P. (1976). Properties of Diagnostic Distributions. Biometrics, 32, 647658.
Diday, E. (1980). Optimisation en Classification Automatique. INRIA, Le Chesnay.
Escofier, B. and Pagès, J. (1988). Analyses Factorielles Simples et Multiples. Dunod, Paris.
Everitt, B.S. (1984). An Introduction to Latent Variable Models. Chapman and Hall, London.
Friedman, J.H. (1991). Multivariate Adaptive Regression Splines (with discussion). Annals of Statistics, 19, 1 - 141.
Gale, W.A. (ed.) (1986). Artificial Intelligence and Statistics. Addison-Wesley, Reading, Mass.
Gordon, A.D. (1981). Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman and Hall, London.
Gower, J.C. (1974). Maximal Predicitive Classification. Biometrics, 30, 643 - 654.
Greenacre, M.J. (1984). Theory and Application of Correspondence Analysis. Academic Press, London.
Hand, D.J. (1981). Discrimination and Classification. J. Wiley and Sons, New York.
Hosmer, D.W. and Lemenshow, S. (1990). Applied Logistic Regression. J. Wiley, New York.
Joliffe, I.T. (1986). Principal Component Analysis. New York, Springer-Verlag. Kullback, S. (1959). Information Theory and Statistics. J. Wiley, New York.
Lou, Z. and Ciampi A. (1992). Reuse Oriented Approach in Developing Statistical Software. In H.J. Newton (ed.), Proceedings of the 24th Symposium on the Interface, Computing Science and Statistics, 24, 40 - 44.
Quinlan, J. (1986). Induction of Decision Trees. Machine Learning, 1, 81 - 106. Sauerbrei, W., Schumacher, M. Private communication.
Sonquist, J.A. and Morgan, J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research, University of Michigan.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ciampi, A. (1994). Classification and Discrimination: the RECPAM Approach. In: Dutter, R., Grossmann, W. (eds) Compstat. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-52463-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-52463-9_13
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-0793-6
Online ISBN: 978-3-642-52463-9
eBook Packages: Springer Book Archive