Combining classifiers by constructive induction

  • João Gama
Multiple Models for Classification
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data set by adding new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. Cascade Generalization produces a single but structured model for the data that combines the model class representation of the base classifiers. We have performed an empirical evaluation of Cascade composition of three well known classifiers: Naive Bayes, Linear Discriminant, and C4.5. Composite models show an increase of performance, sometimes impressive, when compared with the corresponding single models, with significant statistical confidence levels.


Base Classifier Composite Model Query Point Machine Learn Community Constructive Induction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ali, K. and Pazzani, M. (1996) “Error reduction through Learning Multiple Descriptions”, in Machine Learning, Vol. 24, No. 1 Kluwer Academic PublishersGoogle Scholar
  2. 2.
    Breiman,L. (1996) “Bagging predictors“, in Machine Learning, 24 Kluwer Academic PublishersGoogle Scholar
  3. 3.
    Breiman,L. (1996) “Bias, Variance, and Arcing Classifiers”, Technical Report 460, Statistics Department, University of CaliforniaGoogle Scholar
  4. 4.
    Brodley, C. (1995) “Recursive Automatic Bias Selection for Classifier Construction”, in Machine Learning, 20, 1995, Kluwer Academic PublishersGoogle Scholar
  5. 5.
    Buntine, W. (1990) “A theory of Learning Classification Rules”, Phd Thesis, University of SydneyGoogle Scholar
  6. 6.
    Chan P. and Stolfo S., (1995) “A Comparative Evaluation of Voting and Metalearning on Partitioned Data”, in Machine Learning Proc of 12th International Conference, Ed. L.SaittaGoogle Scholar
  7. 7.
    Chan P. and Stolfo S. (1995) “Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning”, KDD 95Google Scholar
  8. 8.
    Domingos P. and Pazzani M. (1996) “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier“, in Machine Learning Proc. of 12th International Conference, Ed. L.SaittaGoogle Scholar
  9. 9.
    Freund, Y. and Schapire, R (1996) “Experiments with a new boosting algorithm”, in Machine Learning Proc of 13th International Conference, Ed. L. SaittaGoogle Scholar
  10. 10.
    Gama, J, (1997) “Probabilistic Linear Tree”, in Machine Learning Proc. of the 14th International Conference Ed. D.FisherGoogle Scholar
  11. 11.
    Gama,J. (1997) “Oblique Linear Tree”, in Advances in Intelligent Data Analysis — Reasoning about Data', Ed. X.Liu, P.Cohen, M.Berthold, Springer Verlag LNCSGoogle Scholar
  12. 12.
    Henery R. (1997) “Combining Classification Procedures” in Machine Learning and Statistics. The Interface. Ed. Nakhaeizadeh, C. Taylor, John Wiley & Son, Inc.Google Scholar
  13. 13.
    Kohavi, R and Wolpert, D. (1996) “Bias plus Variance Decomposition for zero-one loss function”, in Machine Learning Proc of 13th International Conference, Ed. Lorenza SaittaGoogle Scholar
  14. 14.
    Langley P. (1993) “Induction of recursive Bayesian Classifiers”, in Machine Learning: ECML-93 Ed. P.Brazdil, LNAI n667, Springer VerlagGoogle Scholar
  15. 15.
    Mitchell T. (1997) Machine Learning, MacGraw-Hill Companies, Inc.Google Scholar
  16. 16.
    Quinlan R., (1996) “Bagging, Boosting and C4.5”, Procs. 13th American Association for Artificial Intelligence, AAAI PressGoogle Scholar
  17. 17.
    Quinlan, R. (1993) C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, Inc.Google Scholar
  18. 18.
    Ting K.M. and Witten I.H. (1997) “Stacked Generalization: when does it work?” in Procs. International Joint Conference on Artificial Intelligence Google Scholar
  19. 19.
    Tumer K. and Ghosh J. (1995) “Classifier combining: analytical results and implications”, in Proceedings of Workshop in Induction of Multiple Learning Models Google Scholar
  20. 20.
    Thrun S., et all, (1991) The Monk's problems: A performance Comparison of different Learning Algorithms, CMU-CS-91-197Google Scholar
  21. 21.
    Wolpert D. (1992) “Stacked Generalization”, Neural Networks Vol.5, Pergamon PressGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • João Gama
    • 1
  1. 1.LIACC, FEPUniversity of PortoPorto

Personalised recommendations