Efficient multisplitting on numerical data

Elomaa, Tapio; Rousu, Juho

doi:10.1007/3-540-63223-9_117

Tapio Elomaa¹ &
Juho Rousu²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1263))

Included in the following conference series:

European Symposium on Principles of Data Mining and Knowledge Discovery

363 Accesses

Abstract

Numerical data poses a problem to symbolic learning methods, since numerical value ranges inherently need to be partitioned into intervals for representation and handling. An evaluation function is used to approximate the goodness of different partition candidates. Most existing methods for multisplitting on numerical attributes are based on heuristics, because of the apparent efficiency advantages. We characterize a class of well-behaved cumulative evaluation functions for which efficient discovery of the optimal multisplit is possible by dynamic programming. A single pass through the data suffices to evaluate multisplits of all arities. This class contains many important attribute evaluation functions familiar from symbolic machine learning research. Our empirical experiments convey that there is no significant differences in efficiency between the method that produces optimal partitions and those that are based on heuristics. Moreover, we demonstrate that optimal multisplitting can be beneficial in decision tree learning in contrast to using the much applied binarization of numerical attributes or heuristical multisplitting.

Download to read the full chapter text

Chapter PDF

SPAARC: A Fast Decision Tree Algorithm

Extreme Accuracy in Symbolic Regression

Discretizing Numerical Attributes: An Analysis of Human Perceptions

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Auer, P., Holte, R., Maass, W.: Theory and application of agnostic PAC-learning with small decision trees. In A. Prieditis, S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning (21–29). Morgan Kaufmann, San Francisco, CA, 1995.
Google Scholar
Breiman, L.: Some properties of splitting criteria. Mach. Learn. 24 (1996) 41–47.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Pacific Grove, CA, 1984.
Google Scholar
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (ed.), Proc. Fifth European Working Session on Learning (164–178), Lecture Notes in Computer Science 482. Springer-Verlag, Berlin, 1991.
Google Scholar
Elomaa, T., Rousu, J.: General and efficient multisplitting of numerical attributes. Report C-1996-82. Department of Computer Science, University of Helsinki. Oct. 1996, 25 pp.
Google Scholar
Elomaa, T., Rousu, J.: On the well-behavedness of important attribute evaluation functions. NeuroCOLT Technical Report NC-TR-97-006. Department of Computer Science, Royal Holloway, University of London. Feb. 1997, 14 pp.
Google Scholar
Fayyad, U., Irani, K.: On the handling of continuous-valued attributes in decision tree generation. Mach. Learn. 8 (1992) 87–102.
Google Scholar
Fayyad, U., Irani, K.: Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence (1022–1027). Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
Fulton, T., Kasif, S., Salzberg, S.: Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis, S. Russell (eds.), Proc. Twelfth International Conference on Machine Learning (244–251). Morgan Kaufmann, San Francisco, CA, 1995.
Google Scholar
Kononenko, I.: On biases in estimating multi-valued attributes. In Proc. Fourteenth International Joint Conference on Artificial Intelligence (1034–1040). Morgan Kaufmann, San Francisco, CA, 1995.
Google Scholar
López de Mántaras, R.: A distance-based attribute selection measure for decision tree induction. Mach. Learn. 6 (1991) 81–92.
Google Scholar
Quinlan, R.: Induction of decision trees. Mach. Learn. 1 (1986) 81–106.
Google Scholar
Quinlan, R.: C4-5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
Quinlan, R.: Improved use of continuous attributes in C4.5. J. Artif. Intell. Res. 4 (1996) 77–90.
Article Google Scholar
Wallace, C., Patrick, J.: Coding decision trees. Mach. Learn. 11 (1993) 7–22.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Systems, Informatics and Safety, Joint Research Centre European Commission, TP 270, 1-21020, Ispra, VA, Italy
Tapio Elomaa
VTT Biotechnology and Food Research, Tietotie 2, P. O. Box 1501, FIN-02044, VTT, Finland
Juho Rousu

Authors

Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar
Juho Rousu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan Komorowski Jan Zytkow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Elomaa, T., Rousu, J. (1997). Efficient multisplitting on numerical data. In: Komorowski, J., Zytkow, J. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1997. Lecture Notes in Computer Science, vol 1263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63223-9_117

Download citation

DOI: https://doi.org/10.1007/3-540-63223-9_117
Published: 06 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63223-8
Online ISBN: 978-3-540-69236-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Efficient multisplitting on numerical data

Abstract

Chapter PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Extreme Accuracy in Symbolic Regression

Discretizing Numerical Attributes: An Analysis of Human Perceptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Efficient multisplitting on numerical data

Abstract

Chapter PDF

Similar content being viewed by others

SPAARC: A Fast Decision Tree Algorithm

Extreme Accuracy in Symbolic Regression

Discretizing Numerical Attributes: An Analysis of Human Perceptions

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation