Abstract
We define a formula for estimating the coding costs of decision lists for propositional domains. This formula allows for multiple classes and both categorical and numerical attributes. For artificial domains the formula performs quite satisfactory, whereas results are rather mixed and inconclusive for natural domains. Further experiments lead to a principled simplification of the original formula which is robust in both artificial and natural domains. Simple hill-climbing search for the most compressive decision list significantly reduces the complexity of a given decision list while not impeding and sometimes even improving its predictive accuracy.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cohen W.W.: Fast Effective Rule Induction, in Prieditis A. and Russell S.(eds.), Proceedings of the 12th International Conference on Machine Learning (ML95), Morgan Kaufmann, San Francisco, 115–123, 1995.
Forsyth R.S., Clarke D.D., Wright R.L.: Overfitting Revisited: An Information-Theoretic Approach to Simplifying Discrimination Trees, JETAI Journal of Experimental and Theoretical Artificial Intelligence, 6,3, 1994.
Fürnkranz J.: Pruning Algorithms for Rule Learning, Österreichisches Forschungsinstitut für Artificial Intelligence, Wien, TR-96-07, 1996.
Georgeff M.P., Wallace C.S.: A General Selection Criterion for Inductive Inference, in O'Shea T.(ed.), Proceedings of the Sixth European Conference on Artificial Intelligence (ECAI-84), Elsevier, Amsterdam, 1984.
Kohavi R.: Wrappers for Performance Enhancement and Oblivious Decision Graphs, Computer Science Dept., Stanford University, Stanford, CA 94305, USA, PhD Dissertation, 1995.
Kononenko I.: On Biases in Estimating the Multi-Valued Attributes, in Mellish C.S.(ed.), Proceedings of the 14th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, pp.1034–1040, 1995.
Kramer S.: Structural Regression Trees, in Proceedings of the Thirteenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, Cambridge, MA, pp.812–819, 1996.
Merz C.J., Murphy P.M.: UCI Repository of machine learning databases, University of California, Department of Information and Computer Science, Irvine, CA, 1996. [http://www.ics.uci.edu/ mlearn/MLRepository.html]
Muggleton S., Srinivasan A., Bain M.: Compression, Significance, and Accuracy, in Sleeman D. and Edwards P.(eds.), Machine Learning: Proceedings of the Ninth International Workshop (ML92), Morgan Kaufmann, San Mateo, CA, pp.338–347, 1992.
Oliveira A., Sangiovanni-Vincentelli A.: Inferring Reduced Ordered Decision Graphs of Minimal Description Length, in Prieditis A. and Russell S.(eds.), Proceedings of the 12th International Conference on Machine Learning (ML95), Morgan Kaufmann, San Francisco, 1995.
Pfahringer B.: Practical Uses of the Minimum Description Length Principle in Inductive Learning, Institut für Med.Kybernetik u. AI, Technische Universität Wien, Dissertation, 1995.
Pfahringer B.: A New MDL Measure for Robust Rule Induction (Extended Abstract), in Lavrac N. and Wrobel S.(eds.), Machine Learning: ECML-95, Springer, Berlin Heidelberg New York, pp.331–334, 1995.
Quinlan, J.R.: Simplifying decision trees. Proc Workshop on Knowledge Acquisition for Knowledge-based Systems, Banff, Canada.(1986)
Quinlan J.R., Rivest R.L.: Inferring Decision Trees Using the Minimum Description Length Principle, Information and Computation, 80:227–248, 1989.
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993.
Quinlan J.R.: Combining Instance-Based and Model-Based Learning, in Proceedings of the Tenth International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, pp.236–243, 1993.
Quinlan J.R.: The Minimum Description Length Principle and Categorical Theories, in Cohen W.W. and Hirsh H.(eds.), Machine Learning, Rutgers University, New Brunswick, NJ, pp.233–241, 1994.
Quinlan J.R.: MDL and Categorical Theories (Continued), in Prieditis A. and Russell S.(eds.), Proceedings of the 12th International Conference on Machine Learning (ML95), Morgan Kaufmann, San Francisco, 1995.
Rissanen J.: Stochastic Complexity and Modeling, in The Annals of Statistics, 14(3),p.1080–1100, 1986.
Weiss S.M., Indurkhya N.: Rule-based Machine Learning Methods for Functional Prediction, Journal of Artificial Intelligence Research 3 (1995), 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pfahringer, B. (1997). Compression-based pruning of decision lists. In: van Someren, M., Widmer, G. (eds) Machine Learning: ECML-97. ECML 1997. Lecture Notes in Computer Science, vol 1224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62858-4_85
Download citation
DOI: https://doi.org/10.1007/3-540-62858-4_85
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62858-3
Online ISBN: 978-3-540-68708-5
eBook Packages: Springer Book Archive