Skip to main content

Consolidated Trees: An Analysis of Structural Convergence

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3755))

Abstract

When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary. We have developed a methodology for building classification trees from multiple samples where the final classifier is a single decision tree (Consolidated Trees). The paper presents an analysis of the structural stability of our algorithm versus C4.5 algorithm. The classification trees generated with our algorithm, achieve smaller error rates and structurally more steady trees than C4.5 when using resampling techniques. The main focus on this paper is showing how Consolidated Trees built with different sets of subsamples tend to converge to the same tree when the number of used subsamples is increased.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, E., Kohavi, R.: An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning 36, 105–139 (1999)

    Article  Google Scholar 

  2. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases, University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn.MLRepository.html

  3. Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  4. Chan, P.K., Stolfo, S.J.: Toward Scalable Learning with Non-uniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, pp. 164–168 (1998)

    Google Scholar 

  5. Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10(7), 1895–1924 (1998)

    Article  Google Scholar 

  6. Dietterich, T.G.: An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization. Machine Learning 40, 139–157 (2000)

    Article  Google Scholar 

  7. Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning Nashville, TN, pp. 98–106 (1997)

    Google Scholar 

  8. Drummond, C., Holte, R.C.: Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. In: Proceedings of the 17th International Conference on Machine Learning, pp. 239–246 (2000)

    Google Scholar 

  9. Elkan, C.: The Foundations of Cost-Sensitive Learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, pp. 973–978 (2001)

    Google Scholar 

  10. Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Proceedings of the 13th International Conference on Machine Learning, pp. 148–156 (1996)

    Google Scholar 

  11. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001) ISBN: 0-387-95284-5

    MATH  Google Scholar 

  12. Japkowicz, N.: Learning from Imbalanced Data Sets: A Comparison of Various Strategies. In: Proceedings of the AAAI Workshop on Learning from Imbalanced Data Sets, Menlo Park, CA (2000)

    Google Scholar 

  13. Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I.: A new algorithm to build consolidated trees: study of the error rate and steadiness. In: Proceedings of the conference on Intelligent Information Systems, Zakopane, Poland (2004)

    Google Scholar 

  14. Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Behaviour of Consolidated Trees when using Resampling Techniques. In: Proceedings of the 4th International Workshop on Pattern Recognition in Information Systems, PRIS, Porto, Portugal (2004)

    Google Scholar 

  15. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (eds.), San Mateo (1993)

    Google Scholar 

  16. Skurichina, M., Kuncheva, L.I., Duin, R.P.W.: Bagging and Boosting for the Nearest Mean Classifier: Effects of Sample Size on Diversity and Accuracy. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 62–71. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Turney, P.: Bias and the quantification of stability. Machine Learning 20, 23–33 (1995)

    Google Scholar 

  18. Weiss, G.M., Provost, F.: Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction. Journal of Artificial Intelligence Research 19, 315–354 (2003)

    MATH  Google Scholar 

  19. Windeatt, T., Ardeshir, G.: Boosted Tree Ensembles for Solving Multiclass Problems. In: Roli, F., Kittler, J. (eds.) MCS 2002. LNCS, vol. 2364, pp. 42–51. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I. (2006). Consolidated Trees: An Analysis of Structural Convergence. In: Williams, G.J., Simoff, S.J. (eds) Data Mining. Lecture Notes in Computer Science(), vol 3755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11677437_4

Download citation

  • DOI: https://doi.org/10.1007/11677437_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-32547-5

  • Online ISBN: 978-3-540-32548-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics