Overcoming fragmentation in decision trees through attribute value grouping

Ho, K. M.; Scott, P. D.

doi:10.1007/BFb0094836

K. M. Ho¹ &
P. D. Scott¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1510))

Included in the following conference series:

European Symposium on Principles of Data Mining and Knowledge Discovery

318 Accesses

Abstract

In this paper we investigate several methods for producing smaller decision trees by reducing fragmentation through the use of methods that lower the mean branching factor. All the methods considered achieve this goal by grouping the values that each attribute may take. We show how such grouping may be carried out by using either top-down iterative splitting or bottom-up iterative merging. Such methods may be applied either globally at the onset of tree construction or locally whenever a new node is considered. We also compare two approaches to assessing the quality of such attribute value groupings: information gain ratio, as employed in C4.5, and a combination of χ² and Cramer’s V. The results of a comparative study of eight methods show that a top-down global method, using χ² and Cramer’s V, produces consistently smaller tree sizes without loss of accuracy or computation time. These findings may be of considerable practical importance in data mining since it is widely recognised that smaller trees are much easier to understand.

Download to read the full chapter text

Chapter PDF

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Pacific Grove, CA., 1984.
MATH Google Scholar
L. A. Breslow and D. W. Aha. Simplifying Decision Trees: A Survey. Knowledge Engineering Review, 12:1–40, 1997.
Article Google Scholar
B. Cestnik, I. Konoenko, and I. Bratko. A knowledge elicitation tool for sophisticated users. In I. Bratko and N. Lavrac, editors, Progress in Machine Learning. Sigma Press, Wilmslow, England, 1987.
Google Scholar
P. R. Cohen and D. Jensen. Overfitting Explained. In Proc. Sixth International Workshop on Artificial Intelligence and Statistics, pages 115–122, FL, 1997. Ft. Lauderdale.
Google Scholar
B. S. Everitt. Cluster Analysis. Heinemann, London, 2nd edition, 1980.
MATH Google Scholar
U. M. Fayyad and K. B. Irani. The attribute selection problem in decision tree generation. In Proc. Tenth National Conference on Artificial Intelligence, pages 104–110, San Jose, CA., 1992. AAAI Press.
Google Scholar
J. Healey. Statistics: A Tool For Social Research. Wadsworth, Belmont, CA., 1990.
Google Scholar
E. Hunt, J. Martin, and P. Stone. Experiments in Induction. Academic Press, New York, 1966.
Google Scholar
R. Kohavi, G. John, D. Manley, and K. Pfleger. MLC++: Amachine learning library in C++. In Tools with Artificial Intelligence, pages 740–743. IEEE Computer Society Press, 1994.
Google Scholar
I. Kononenko. A counter example to the stronger version of the binary tree hypothesis. In ECML-95 Workshop on Statistics and Machine Learning in KDD, Crete, 1995.
Google Scholar
T. Oates and D. Jensen. The Effects of Training Set Size on Decision Tree Complexity. In The Preliminary Papers of the Sixth International Workshop on Artificial Intelligence and Statistics, pages 379–390, 1997.
Google Scholar
J. R. Quinlan. The effect of noise on concept learning. In R.S. Michalski, J.G. Carbonell, and T.M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach. Volume II. Morgan Kaufman Publ. Inc., Los Altos, CA, 1986.
Google Scholar
J. R. Quinlan. Programs for Machine Learning. Morgan Kaufman Publ. Inc., Los Altos, CA, 1993.
Google Scholar
J. R. Quinlan. Improved use of continuous attributes in c4.5. Journal of Artificial Intelligence Research, 4:77–90, 1996.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Essex, CO4 3SQ, Colchester, UK
K. M. Ho & P. D. Scott

Authors

K. M. Ho
View author publications
You can also search for this author in PubMed Google Scholar
P. D. Scott
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jan M. Żytkow Mohamed Quafafou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ho, K.M., Scott, P.D. (1998). Overcoming fragmentation in decision trees through attribute value grouping. In: Żytkow, J.M., Quafafou, M. (eds) Principles of Data Mining and Knowledge Discovery. PKDD 1998. Lecture Notes in Computer Science, vol 1510. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0094836

Download citation

DOI: https://doi.org/10.1007/BFb0094836
Published: 19 October 2006
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65068-3
Online ISBN: 978-3-540-49687-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics