Multivariate Discretization by Recursive Supervised Bipartition of Graph

Ferrandiz, Sylvain; Boullé, Marc

doi:10.1007/11510888_25

Sylvain Ferrandiz^20,21 &
Marc Boullé²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3587))

Included in the following conference series:

International Workshop on Machine Learning and Data Mining in Pattern Recognition

2095 Accesses
9 Citations

Abstract

In supervised learning, discretization of the continuous explanatory attributes enhances the accuracy of decision tree induction algorithms and naive Bayes classifier. Many discretization methods have been developped, leading to precise and comprehensible evaluations of the amount of information contained in one single attribute with respect to the target one.

In this paper, we discuss the multivariate notion of neighborhood, extending the univariate notion of interval. We propose an evaluation criterion of bipartitions, which is based on the Minimum Description Length (MDL) principle [1], and apply it recursively. The resulting discretization method is thus able to exploit correlations between continuous attributes. Its accuracy and robustness are evaluated on real and synthetic data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proc. of the 12^th ICML, pp. 194–202 (1995)
Google Scholar
Kerber, R.: Chimerge discretization of numeric attributes. In: Tenth International Conference on Artificial Intelligence, pp. 123–128 (1991)
Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Machine Learning 8, 87–102 (1992)
MATH Google Scholar
Boullé, M.: A bayesian approach for supervised discretization. In: Zanasi, A., Ebecken, N.F.F., Brebbia, C.A. (eds.) Data Mining V, pp. 199–208. WIT Press, Southampton (2004)
Google Scholar
Bay, S.: Multivariate discretization of continuous variables for set mining. In: Proc. of the 6^th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 315–319 (2000)
Google Scholar
Kwedlo, W., Kretowski, M.: An evolutionary algorithm using multivariate discretization for decision rule induction. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 392–397. Springer, Heidelberg (1999)
Chapter Google Scholar
Jaromczyk, J.W., Toussaint, G.T.: Relative neighborhood graphs and their relatives. P-IEEE 80, 1502–1517 (1992)
Article Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. Institute of Electrical and Electronics Engineers Transactions on Information Theory 13, 21–27 (1967)
MATH Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, http://www.ics.uci.edu/~mlearn/MLRepository.html

Download references

Author information

Authors and Affiliations

France Télécom R&D, 2, avenue Pierre Marzin, 22307, Lannion Cedex, France
Sylvain Ferrandiz & Marc Boullé
Université de Caen, GREYC, Campus Côte de Nacre, boulevard du Maréchal Juin, BP 5186, 14032, Caen Cedex, France
Sylvain Ferrandiz

Authors

Sylvain Ferrandiz
View author publications
You can also search for this author in PubMed Google Scholar
Marc Boullé
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner
Institute of Media and Information Technology, Chiba University, Japan
Atsushi Imiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ferrandiz, S., Boullé, M. (2005). Multivariate Discretization by Recursive Supervised Bipartition of Graph. In: Perner, P., Imiya, A. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2005. Lecture Notes in Computer Science(), vol 3587. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11510888_25

Download citation

DOI: https://doi.org/10.1007/11510888_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26923-6
Online ISBN: 978-3-540-31891-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics