Skip to main content

Generalized Conditional Entropy and a Metric Splitting Criterion for Decision Trees

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3918))

Included in the following conference series:

Abstract

We examine a new approach to building decision tree by introducing a geometric splitting criterion, based on the properties of a family of metrics on the space of partitions of a finite set. This criterion can be adapted to the characteristics of the data sets and the needs of the users and yields decision trees that have smaller sizes and fewer leaves than the trees built with standard methods and have comparable or better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lerman, I.C.: Classification et analyse ordinale des données, Dunod, Paris (1981)

    Google Scholar 

  2. Daróczy, Z.: Generalized information functions. Information and Control 16, 36–51 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  3. Simovici, D.A., Jaroszewicz, S.: An axiomatization of partition entropy. IEEE Transactions on Information Theory 48, 2138–2142 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. de Mántaras, R.L.: A distance-based attribute selection measure for decision tree induction. Machine Learning 6, 81–92 (1991)

    Article  Google Scholar 

  5. Simovici, D.A., Jaroszewicz, S.: Generalized entropy and decision trees. In: EGC 2003 - Journees francophones d’Extraction et de Gestion de Connaissances, Lyon, France, pp. 369–380 (2003)

    Google Scholar 

  6. Birkhoff, G.: Lattice Theory. American Mathematical Society, Providence (1973)

    Google Scholar 

  7. Barthélemy, J., Leclerc, B.: The median procedure for partitions. In: Partitioning Data Sets, Providence, American Mathematical Society, pp. 3–34 (1995)

    Google Scholar 

  8. Barthélemy, J.: Remarques sur les propriétés metriques des ensembles ordonnés. Math. Sci. hum. 61, 39–60 (1978)

    MATH  Google Scholar 

  9. Monjardet, B.: Metrics on partially ordered sets – a survey. Discrete Mathematics 35, 173–184 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  10. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Addison-Wesley, Boston (2005)

    Google Scholar 

  11. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1998)

    MATH  Google Scholar 

  12. Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  13. Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  14. Simovici, D.A., Singla, N., Kuperberg, M.: Metric incremental clustering of nominal data. In: Proceedings of ICDM 2004, Brighton, UK, pp. 523–527 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Simovici, D.A., Jaroszewicz, S. (2006). Generalized Conditional Entropy and a Metric Splitting Criterion for Decision Trees. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_7

Download citation

  • DOI: https://doi.org/10.1007/11731139_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33206-0

  • Online ISBN: 978-3-540-33207-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics