Skip to main content

Classification and Discrimination: the RECPAM Approach

  • Conference paper
Compstat

Abstract

RECPAM is a method for constructing regression trees from data which permits explicit treatment of a great variety of response variables. It is shown that it is possible to use the RECPAM methodology to find solutions to two basic problems of data analysis: discovering and identifying classes in data (classification), and finding simple and economical rules to assign individuals to classes (discrimination). The RECPAM based approach to classification finds classes which are economically described in terms of some of the variables and such that the joint distribution of all variables is, at least approximately, homogeneous within classes and distinct across classes. RECPAM based discrimination simultaneously treats categorical and continuous variables, while respecting their distinct nature: it builds discrimination models which can be seen as a generalization of tree-based models and multi-category logistic regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716 - 723.

    Google Scholar 

  • Breiman, L. (1991). The II-method for Estimating Multivariate Functions from Noisy Data. Technometrics, 33, 125 - 160.

    Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. Waldsworth International Group, Belmont, California.

    Google Scholar 

  • Buntine, W. (1992). Learning Classification Trees. Statistics and Computing, 2, 63 - 73.

    Google Scholar 

  • Chou, P. (1991). Optimal Partitioning for Classification and Regression Trees. IEEE Transactions of Pattern Analysis, bf 13, 340 - 354.

    Article  Google Scholar 

  • Chou, P., Lookabough, T., and Gray, R.M. (1989). Optimal Pruning with Applications to Tree-structured Source Coding and Modeling. IEEE Transactions of Information Theory, 35, 299 - 315.

    Article  Google Scholar 

  • Ciampi, A. (1991). Generalized Regression Trees. Computational Statistics and Data Analysis, 12, 57 - 78.

    Article  Google Scholar 

  • Ciampi, A. (1992). Constructing Prediction Trees from Data: the RECPAM Approach. Proceedings from the Prague 1991 Summer School on Computational Aspects of Model Choice, 105 - 152. Physica-Verlag, Heidelberg.

    Google Scholar 

  • Ciampi, A., Bush, R.S., Gospodarowicz, M. and Till, J.E. (1981). An Approach to Classifying Prognostic Factors Related to Survival Experience for Non-Hodgkins Lymphoma Patients. Cancer, 47, 621 - 627.

    Article  Google Scholar 

  • Ciampi, A., Chang, C.-H., Hogg, S.A., McKinney, S. (1987). Recursive Partition: a Versatile Method for Exploratory Data Analysis in Biostatistics. In I. MacNeil, G.J. Umphrey (eds.), Festschrifts in Honor of Prof. Joshi, 5, Biostatistics, 23 - 50. D. Reidel, Dordrecht.

    Google Scholar 

  • Ciampi, A., Hendricks, L. and Lou, Z. (1992). Tree-growing for the Multivariate Model: the RECPAM Approach. In Y. Dodge, J. Whittaker (eds.), Computational Statistics, 1, 131 - 136. Physica-Verlag, Berlin.

    Chapter  Google Scholar 

  • Ciampi, A., Hendricks, L. and Lou, Z. (1993). Discriminant Analysis for Mixed Variables: Integrating Trees and Regression Models. In C.M. Cuadras, C.R. Rao (eds.), Multivariate Analysis: Future Directions, 2, 3 - 22. North-Holland, Amsterdam.

    Google Scholar 

  • Ciampi, A., Lou, Z., Lin, Q. and Negassa, A. (1991). Recursive Partition and Amalgamation with the Exponential Family: Theory and Applications. Applied Stochastic Models and Data Analysis, 7, 121 - 137.

    Article  Google Scholar 

  • Ciampi, A., Schiffrin, A., Thiffault, J., Quintal, H., Weitzner, G., Poussier, P. and Lalla, D. (1990). Cluster Analysis of an Insuline-dependent Diabetic Cohort: Towards the Definition of Clinical Subtypes. Journal of Clinical Epidemiology, 43, 701 - 715.

    Article  Google Scholar 

  • Crawford, S. (1989). Extensions to the CART Algorithm. International Journal of Machine-Man Studies, 31, 197 - 217.

    Article  Google Scholar 

  • Dawid, A.P. (1976). Properties of Diagnostic Distributions. Biometrics, 32, 647658.

    Google Scholar 

  • Diday, E. (1980). Optimisation en Classification Automatique. INRIA, Le Chesnay.

    Google Scholar 

  • Escofier, B. and Pagès, J. (1988). Analyses Factorielles Simples et Multiples. Dunod, Paris.

    Google Scholar 

  • Everitt, B.S. (1984). An Introduction to Latent Variable Models. Chapman and Hall, London.

    Google Scholar 

  • Friedman, J.H. (1991). Multivariate Adaptive Regression Splines (with discussion). Annals of Statistics, 19, 1 - 141.

    Article  Google Scholar 

  • Gale, W.A. (ed.) (1986). Artificial Intelligence and Statistics. Addison-Wesley, Reading, Mass.

    Google Scholar 

  • Gordon, A.D. (1981). Classification: Methods for the Exploratory Analysis of Multivariate Data. Chapman and Hall, London.

    Google Scholar 

  • Gower, J.C. (1974). Maximal Predicitive Classification. Biometrics, 30, 643 - 654.

    Article  Google Scholar 

  • Greenacre, M.J. (1984). Theory and Application of Correspondence Analysis. Academic Press, London.

    Google Scholar 

  • Hand, D.J. (1981). Discrimination and Classification. J. Wiley and Sons, New York.

    Google Scholar 

  • Hosmer, D.W. and Lemenshow, S. (1990). Applied Logistic Regression. J. Wiley, New York.

    Google Scholar 

  • Joliffe, I.T. (1986). Principal Component Analysis. New York, Springer-Verlag. Kullback, S. (1959). Information Theory and Statistics. J. Wiley, New York.

    Google Scholar 

  • Lou, Z. and Ciampi A. (1992). Reuse Oriented Approach in Developing Statistical Software. In H.J. Newton (ed.), Proceedings of the 24th Symposium on the Interface, Computing Science and Statistics, 24, 40 - 44.

    Google Scholar 

  • Quinlan, J. (1986). Induction of Decision Trees. Machine Learning, 1, 81 - 106. Sauerbrei, W., Schumacher, M. Private communication.

    Google Scholar 

  • Sonquist, J.A. and Morgan, J.N. (1964). The Detection of Interaction Effects. Ann Arbor: Institute for Social Research, University of Michigan.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ciampi, A. (1994). Classification and Discrimination: the RECPAM Approach. In: Dutter, R., Grossmann, W. (eds) Compstat. Physica, Heidelberg. https://doi.org/10.1007/978-3-642-52463-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-52463-9_13

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-0793-6

  • Online ISBN: 978-3-642-52463-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics