Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

Kavšek, Branko; Lavrač, Nada; Ferligoj, Anuška

doi:10.1007/3-540-44795-4_22

Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction

Branko Kavšek³,
Nada Lavrač³ &
Anuška Ferligoj⁴

Conference paper
First Online: 01 January 2001

3606 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Abstract

In data analysis, induction of decision trees serves two main goals: first, induced decision trees can be used for classification/prediction of new instances, and second, they represent an easy-to-interpret model of the problem domain that can be used for explanation. The accuracy of the induced classifier is usually estimated using N-fold cross validation, whereas for explanation purposes a decision tree induced from all the available data is used. Decision tree learning is relatively non-robust: a small change in the training set may significantly change the structure of the induced decision tree. This paper presents a decision tree construction method in which the domain model is constructed by consensus clustering of N decision trees induced in N-fold cross-validation. Experimental results show that consensus decision trees are simpler than C4.5 decision trees, indicating that they may be a more stable approximation of the intended domain model than decision tree, constructed from the entire set of training instances.

Download to read the full chapter text

Chapter PDF

References

Adams, E.N. (1972). Consensus techniques and the comparison of taxonomic trees. Systematic Zoology, 21, 390–397.
Article Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classi-cation and Regression Trees. Wadsworth International Group, Belmont, CA.
Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24:123–140, 1996.
MATH MathSciNet Google Scholar
Day, W.H.E. (1983). The role of complexity in comparing classifications. Mathematical Biosciences, 66, 97–114.
Article MATH MathSciNet Google Scholar
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistic, 7(1): 1–26.
Article MATH MathSciNet Google Scholar
Faith, D.P. (1988). Consensus applications in the biological sciences. In: Bock, H.H. (Ed.) Classification and Related Methods of Data Analysis, Amsterdam: North-Holland, 325–332.
Google Scholar
Fisher, D.H. (1989). Noise-tolerant conceptual clustering. Proceedings of the Eleventh International Joint Conference on Artificial Intelligence 825–830. San Francisco: Morgan Kaufmann.
Google Scholar
Gordon, A.D. (1981). Classification. London: Chapman and Hall.
MATH Google Scholar
Hartigan, J.A. (1975). Cluster Algorithms. New York: Wiley.
Google Scholar
Kohavi, R. (1995). Wrappers for performance enhancement and oblivious decision graphs. Doctoral dissertation, Stanford University.
Google Scholar
Kononenko, I. and Bratko, I. (1991). Information based evaluation criterion for classifier’s performance. Machine Learning, 6,(1), 67–80.
Google Scholar
Langley, P. (1996). Elements of Machine Learning. Morgan Kaufmann.
Google Scholar
Leclerc, B. (1988). Consensus applications in the social sciences. In: Bock, H.H. (Ed.) Classification and Related Methods of Data Analysis, Amsterdam: North-Holland, 333–340.
Google Scholar
McMorris, F.R. and Neuman, D. (1983). Consensus functions defined on trees. Mathematical Social Sciences, 4, 131–136.
Article MATH MathSciNet Google Scholar
Quinlan, J.R. (1986). Induction of decision trees. Machine Learning, 1(1): 81–106.
Google Scholar
Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. California: Morgan Kaufmann.
Google Scholar
Regnier, S. (1965). Sur quelques aspects mathematiques des problems de classification automatique. I.I.C. Bulletin, 4, 175–191.
Google Scholar
Scheffer, T. and Herbrich, R. (1997). Unbiased assessment of learning algorithms. In Proceedings of the International Joint Conference on Artificial Intelligence, 798–803.
Google Scholar
Sokal, R.R. and Sneath, P.H.A. (1963). Principles of Numerical Taxonomy. San Francisco: Freeman.
Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society, B 36, 111–147.
Google Scholar
Witten, I.H. and Frank, E. (1999). Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco.
Google Scholar
Zhang, J. (1992). On the distributional properties of model selection criteria. Journal of the American Statistical Association, 87(419) 732–737.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute Jožef Stefan, Jamova 39, 1000, Ljubljana, Slovenia
Branko Kavšek & Nada Lavrač
University of Ljubljana, 1000, Ljubljana, Slovenia
Anuška Ferligoj

Authors

Branko Kavšek
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar
Anuška Ferligoj
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kavšek, B., Lavrač, N., Ferligoj, A. (2001). Consensus Decision Trees: Using Consensus Hierarchical Clustering for Data Relabelling and Reduction. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_22

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_22
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics