Abstract
Numerical data poses a problem to symbolic learning methods, since numerical value ranges inherently need to be partitioned into intervals for representation and handling. An evaluation function is used to approximate the goodness of different partition candidates. Most existing methods for multisplitting on numerical attributes are based on heuristics, because of the apparent efficiency advantages. We characterize a class of well-behaved cumulative evaluation functions for which efficient discovery of the optimal multisplit is possible by dynamic programming. A single pass through the data suffices to evaluate multisplits of all arities. This class contains many important attribute evaluation functions familiar from symbolic machine learning research. Our empirical experiments convey that there is no significant differences in efficiency between the method that produces optimal partitions and those that are based on heuristics. Moreover, we demonstrate that optimal multisplitting can be beneficial in decision tree learning in contrast to using the much applied binarization of numerical attributes or heuristical multisplitting.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Auer, P., Holte, R., Maass, W.: Theory and application of agnostic PAC-learning with small decision trees. In A. Prieditis, S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning (21–29). Morgan Kaufmann, San Francisco, CA, 1995.
Breiman, L.: Some properties of splitting criteria. Mach. Learn. 24 (1996) 41–47.
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA, 1984.
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (ed.), Proc. Fifth European Working Session on Learning (164–178), Lecture Notes in Computer Science 482. Springer-Verlag, Berlin, 1991.
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Report C-1996-82. Department of Computer Science, University of Helsinki. Oct. 1996, 25 pp.
Elomaa, T., Rousu, J.: On the well-behavedness of important attribute evaluation functions. NeuroCOLT Technical Report NC-TR-97-006. Department of Computer Science, Royal Holloway, University of London. Feb. 1997, 14 pp.
Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8 (1992) 87–102.
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence (1022–1027). Morgan Kaufmann, San Mateo, CA, 1993.
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis, S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning (244–251). Morgan Kaufmann, San Francisco, CA, 1995.
Kononenko, I.: On biases in estimating multi-valued attributes. In Proc. Fourteenth International Joint Conference on Artificial Intelligence (1034–1040). Morgan Kaufmann, San Francisco, CA, 1995.
López de Mántaras, R.: A distance-based attribute selection measure for decision tree induction. Mach. Learn. 6 (1991) 81–92.
Quinlan, R.: Induction of decision trees. Mach. Learn. 1 (1986) 81–106.
Quinlan, R.: C4-5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
Quinlan, R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4 (1996) 77–90.
Wallace, C., Patrick, J.: Coding decision trees. Mach. Learn. 11 (1993) 7–22.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Elomaa, T., Rousu, J. (1997). Efficient multisplitting on numerical data. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_117
Download citation
DOI: https://doi.org/10.1007/3-540-63223-9_117
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive