Classification learning using all rules

  • Murlikrishna Viswanathan
  • Geoffrey I. Webb
Multiple Models for Classification
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)


The covering algorithm has been ubiquitous in the induction of classification rules. This approach to machine learning uses heuristic search that seeks to find a minimum number of rules that adequately explain the data. However, recent research has provided evidence that learning redundant classifiers can increase predictive accuracy. Learning all possible classifiers seems to be a plausible ultimate form of this notion of redundant classifiers. This paper presents an algorithm that in effect learns all classifiers. Preliminary investigation by Webb (1996b) suggested that a heuristic covering algorithm in general learns classification rules with higher predictive accuracy than those learned by this new approach. In this paper we present an extensive empirical comparison between the learning-all-rules algorithm and three varied established approaches to inductive learning, namely, a covering algorithm, an instance-based learner and a decision tree learner. Empirical evaluation provides strong evidence in support of learning-all-rules as a plausible approach to inductive learning.


Predictive Accuracy Inductive Learning Implicit Bias Covering Algorithm Decision Tree Learner 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Aha, D.W. (1990). A Study of Instance-Based Algorithms for Supervised Learning Tasks. PhD Thesis, Department of Information and Computer Science, University of California, Irvine, Technical Report 90-42.Google Scholar
  2. Aha, D. W. (1997). Editorial on Lazy Learning. Artificial Intelligence Review, 11: 7–10.CrossRefGoogle Scholar
  3. Aha, D. W., Kibler, D., and Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6: 37–66.Google Scholar
  4. Ali, K., Brunk, C., and Pazzani, M. (1994). On learning multiple descriptions of a concept. In Proceedings of Tools with Artificial Intelligence. New Orleans, LA.Google Scholar
  5. Breiman, L. (1996) Bagging predictors. Machine Learning, 24: 123–140.Google Scholar
  6. Clark, Peter and Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3: 261–284.Google Scholar
  7. Clark, P. and Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the Fifth European Working Session on Learning, pp. 151–163.Google Scholar
  8. Dietterich, T. G. and Bakiri, G. (1994). Solving multiclass learning problems via errorcorrecting output codes. Journal of Artificial Intelligence Research, 2: 263–286.Google Scholar
  9. Domingos, P. (1995). Rule induction and instance-based learning: A unified approach. In Proceedings of the 13th International Joint COnference on Artificial Intelligence, Montreal, Morgan Kaufmann, pp. 226–1232.Google Scholar
  10. Fix, E. and J.L. Hodges (1952). Discriminatory analysis — Nonparametric discrimination: Consistency properties. From Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, Texas, pp. 261–279.Google Scholar
  11. Fayyad, U.M. and Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027, Morgan Kaufmann publishers.Google Scholar
  12. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. (1996). Advances in knowledge discovery and data mining. MIT Press, Menlo Park, Ca.Google Scholar
  13. Friedman, J. H., Kohavi, R., and Yun, Y. (1996). Lazy decision trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence. AAAI Press, Portland, OR, pp. 717–724.Google Scholar
  14. Kwok, S. W. and Carter, C. (1990). Multiple decision trees. In Shachter, R. D. and Levitt, T. S. and Kanal, L. N. and Lemmer, J. F. (Eds.) Uncertainty in Artificial Intelligence 4. North Holland, Amsterdam, pp. 327–335.Google Scholar
  15. Michalski, R. S. (1984) A theory and methodology of inductive learning. In Michalski, R. S. and Carbonell, J. G. and Mitchell, T. M. (Eds.) Machine Learning: An Artificial Intelligence Approach. Springer-Verlag, Berlin, pp. 83–129.Google Scholar
  16. Merz, C.J., and Murphy, P.M. (1997). UCI Repository of machine learning databases [ mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
  17. Muggleton, Stephen and Feng, C. (1990). Efficient induction of logic programs. In Proceedings of the First Conference on Algorithmic Learning Theory, Tokyo.Google Scholar
  18. Nock, R. and Olivier G. (1995). On learning decision committees. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 413–420, Taho City, Ca. Morgan Kaufmann publishers.Google Scholar
  19. Oliver, J. J. and Hand, D. J. (1995). On pruning and averaging decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, Morgan Kaufmann, Taho City, Ca., pp. 430–437.Google Scholar
  20. Quinlan, J.R. (1990) Learning logical definitions from relations. Machine Learning, 5: 239–266.Google Scholar
  21. Quinlan, J.R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.Google Scholar
  22. Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.Google Scholar
  23. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5: 197–227.Google Scholar
  24. Ting K. M., (1995). Common Issues in Instance-based and Naive Bayesian Classifiers. PhD thesis, Basser Dept of Computer Science, University of Sydney.Google Scholar
  25. Webb, G. I. (1993). Systematic search for categorical attribute-value data-driven machine learning. In AI'93 — Proceedings of the Sixth Australian Joint Conference on Artificial Intelligence, World Scientific, Melbourne, pp. 342–347.Google Scholar
  26. Webb, G.I. (1995). An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 3: 431–465.Google Scholar
  27. Webb, G. I. (1996a). Further experimental evidence against the utility of Occam's razor. Journal of Artificial Intelligence Research, 4: 397–417.Google Scholar
  28. Webb, G. I. (1996b). A heuristic covering algorithm has higher predictive accuracy than learning all rules. In Proceedings of Information, Statistics and Induction in Science, Melbourne, pp. 20–30.Google Scholar
  29. Wogulis, J. and Langley, P. (1989). Improving efficiency by learning intermediate concepts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, pp. 657–662.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Murlikrishna Viswanathan
    • 1
  • Geoffrey I. Webb
    • 1
  1. 1.School of Computing and MathematicsDeakin UniversityGeelongAustralia

Personalised recommendations