Clustered Multiple Regression

  • Luis Torgo
  • J. Pinto da Costa
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


This paper describes a new method for dealing with multiple regression problems. This method integrates a clustering technique with regression trees, leading to what we have named as clustered regression trees. We use the clustering method to form sub-samples of the given data that are similar in terms of the predictor variables. By proceeding this way we aim at facilitating the subsequent regression modeling process based on the assumption of a certain smoothness of the regression surface. For each of the found clusters we obtain a different regression tree. These clustered regression trees can be used to predict the response value for a query case by an averaging process based on the cluster membership probabilities of the case. We have carried out a series of experimental comparisons of our proposal that have shown a significant predictive accuracy advantage over the use of a single regression tree.


Regression Tree Multivariate Adaptative Regression Spline Cluster Membership Average Mean Square Error Bayesian Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. BREIMAN, L. (1996): Bagging Predictors. Machine Learning, 24(3), 123–140.Kluwer Academic Publishers.Google Scholar
  2. BREIMAN L., FRIEDMAN J. OLSHEN R STONE C. (1984): Classification and Regression Trees. Wadsworth Int. Group.Google Scholar
  3. CHEESEMAN P. KELLY J. SELF M. STUTZ J. (1988): Autoclass: A Bayesian Classification System. In: Proceedings of the Fifth International Conference on Machine Learning, Ann Arbor, MI. June 12–14 1988. Morgan Kaufmann, San Francisco, 54–64Google Scholar
  4. CHEESEMAN P. STUTZ J. (1995): Bayesian Classification (Auto Class): Theory and Results. In: Usama M. Fayyad Gregory Piatetsky-Shapiro Padhraic Smyth and Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. The AAAI Press.Google Scholar
  5. CLEVELAND W. LOADER C. (1995): Smoothing by Local Regression: Principles and Methods (with discussion). Computational Statistics.Google Scholar
  6. FREUND Y. and SCHAPIRE R. (1996): Experiments with a new boosting algorithm. In: L. Saitta (Ed): Proceedings of the 13th International Conference on Machine Learning. Morgan Kaufmann.Google Scholar
  7. FRIEDMAN, J. (1991): Multivariate Adaptative Regression Splines. Annals of Statistics, 19:1, 1–141.CrossRefGoogle Scholar
  8. SCHAPIRE R., (1990): The strength of weak learnability. Machine Learning, 5, 197–227Google Scholar
  9. TORGO L., (1999): Inductive Learning of Tree-based Regression Models. Ph.D. Thesis. Department of Computer Science, Faculty of Sciences, University of Porto, Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 2000

Authors and Affiliations

  • Luis Torgo
    • 1
  • J. Pinto da Costa
    • 2
  1. 1.LIACC-FEP, University of PortoPortoPortugal
  2. 2.LIACC-DMA-Faculty of SciencesUniversity of PortoPortoPortugal

Personalised recommendations