Abstract
In this paper we investigate several methods for producing smaller decision trees by reducing fragmentation through the use of methods that lower the mean branching factor. All the methods considered achieve this goal by grouping the values that each attribute may take. We show how such grouping may be carried out by using either top-down iterative splitting or bottom-up iterative merging. Such methods may be applied either globally at the onset of tree construction or locally whenever a new node is considered. We also compare two approaches to assessing the quality of such attribute value groupings: information gain ratio, as employed in C4.5, and a combination of χ2 and Cramer’s V. The results of a comparative study of eight methods show that a top-down global method, using χ2 and Cramer’s V, produces consistently smaller tree sizes without loss of accuracy or computation time. These findings may be of considerable practical importance in data mining since it is widely recognised that smaller trees are much easier to understand.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Pacific Grove, CA., 1984.
L. A. Breslow and D. W. Aha. Simplifying Decision Trees: A Survey. Knowledge Engineering Review, 12:1–40, 1997.
B. Cestnik, I. Konoenko, and I. Bratko. A knowledge elicitation tool for sophisticated users. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, Wilmslow, England, 1987.
P. R. Cohen and D. Jensen. Overfitting Explained. In Proc. Sixth International Workshop on Artificial Intelligence and Statistics, pages 115–122, FL, 1997. Ft. Lauderdale.
B. S. Everitt. Cluster Analysis. Heinemann, London, 2nd edition, 1980.
U. M. Fayyad and K. B. Irani. The attribute selection problem in decision tree generation. In Proc. Tenth National Conference on Artificial Intelligence, pages 104–110, San Jose, CA., 1992. AAAI Press.
J. Healey. Statistics: A Tool For Social Research. Wadsworth, Belmont, CA., 1990.
E. Hunt, J. Martin, and P. Stone. Experiments in Induction. Academic Press, New York, 1966.
R. Kohavi, G. John, D. Manley, and K. Pfleger. MLC++: Amachine learning library in C++. In Tools with Artificial Intelligence, pages 740–743. IEEE Computer Society Press, 1994.
I. Kononenko. A counter example to the stronger version of the binary tree hypothesis. In ECML-95 Workshop on Statistics and Machine Learning in KDD, Crete, 1995.
T. Oates and D. Jensen. The Effects of Training Set Size on Decision Tree Complexity. In The Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 379–390, 1997.
J. R. Quinlan. The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach. Volume II. Morgan Kaufman Publ. Inc., Los Altos, CA, 1986.
J. R. Quinlan. Programs for Machine Learning. Morgan Kaufman Publ. Inc., Los Altos, CA, 1993.
J. R. Quinlan. Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research, 4:77–90, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ho, K.M., Scott, P.D. (1998). Overcoming fragmentation in decision trees through attribute value grouping. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0094836
Download citation
DOI: https://doi.org/10.1007/BFb0094836
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive