Skip to main content

Multivariate Discretization by Recursive Supervised Bipartition of Graph

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3587))

Abstract

In supervised learning, discretization of the continuous explanatory attributes enhances the accuracy of decision tree induction algorithms and naive Bayes classifier. Many discretization methods have been developped, leading to precise and comprehensible evaluations of the amount of information contained in one single attribute with respect to the target one.

In this paper, we discuss the multivariate notion of neighborhood, extending the univariate notion of interval. We propose an evaluation criterion of bipartitions, which is based on the Minimum Description Length (MDL) principle [1], and apply it recursively. The resulting discretization method is thus able to exploit correlations between continuous attributes. Its accuracy and robustness are evaluated on real and synthetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  2. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. of the 12th ICML, pp. 194–202 (1995)

    Google Scholar 

  3. Kerber, R.: Chimerge discretization of numeric attributes. In: Tenth International Conference on Artificial Intelligence, pp. 123–128 (1991)

    Google Scholar 

  4. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  5. Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)

    MATH  Google Scholar 

  6. Boullé, M.: A bayesian approach for supervised discretization. In: Zanasi, A., Ebecken, N.F.F., Brebbia, C.A. (eds.) Data Mining V, pp. 199–208. WIT Press, Southampton (2004)

    Google Scholar 

  7. Bay, S.: Multivariate discretization of continuous variables for set mining. In: Proc. of the 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 315–319 (2000)

    Google Scholar 

  8. Kwedlo, W., Kretowski, M.: An evolutionary algorithm using multivariate discretization for decision rule induction. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 392–397. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  9. Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. P-IEEE 80, 1502–1517 (1992)

    Article  Google Scholar 

  10. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13, 21–27 (1967)

    MATH  Google Scholar 

  11. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferrandiz, S., Boullé, M. (2005). Multivariate Discretization by Recursive Supervised Bipartition of Graph. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_25

Download citation

  • DOI: https://doi.org/10.1007/11510888_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26923-6

  • Online ISBN: 978-3-540-31891-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics