Advertisement

Machine Learning

, Volume 78, Issue 3, pp 343–379 | Cite as

On the quest for optimal rule learning heuristics

  • Frederik Janssen
  • Johannes Fürnkranz
Article

Abstract

The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.

Inductive rule learning Heuristics Metalearning 

References

  1. Akaike, H. (1974). A new look at the statistical model selection. IEEE Transactions on Automatic Control, 19(6), 716–723. MATHCrossRefMathSciNetGoogle Scholar
  2. Asuncion, A., & Newman, D. (2007). UCI machine learning repository. http://archive.ics.uci.edu/ml/.
  3. Bayardo, R. Jr., & Agrawal, R. (1999). Mining the most interesting rules. In Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD-97) (pp. 145–154). Google Scholar
  4. Brin, S., Motwani, R., & Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In Proceedings of the ACM SIGMOD international conference on management of data (pp. 265–276). Google Scholar
  5. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85. Google Scholar
  6. Burges, S. (2006). Meta-Lernen einer Evaluierungs-Funktion für einen Regel-Lerner. Master’s thesis, TU Darmstadt, December 2006 (in German) (English title: Meta-learning of an evaluation function for a rule learner). Google Scholar
  7. Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In L. Aiello (Ed.), Proceedings of the 9th European conference on artificial intelligence (pp. 147–150). ECAI-90, Stockholm, Sweden, 1990. London: Pitman. Google Scholar
  8. Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European working session on learning (pp. 151–163). EWSL-91, Porto, Portugal, 1991. Berlin: Springer. Google Scholar
  9. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283. Google Scholar
  10. Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th international conference on machine learning (pp. 115–123). Tahoe City, CA, July 9–12, 1995. San Mateo: Morgan Kaufmann. Google Scholar
  11. Demsar, J. (2006). Statistical comparisons of classifiers over multiple datasets. Journal of Machine Learning Research, 7, 1–30. MathSciNetGoogle Scholar
  12. Fan, R.-E., Chen, P.-H., Lin, C.-J., & Joachims, T. (2005). Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6, 1889–1918. Google Scholar
  13. Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th international conference on machine learning (pp. 144–151). ICML-98, Madison, WI, 1998. San Mateo: Morgan Kaufmann. Google Scholar
  14. Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3–54. MATHCrossRefGoogle Scholar
  15. Fürnkranz, J. (2004). Modeling rule precision. In J. Fürnkranz (Ed.), Proceedings of the ECML/PKDD-04 workshop on advances in inductive rule learning (pp. 30–45). Pisa, Italy, 2004. Google Scholar
  16. Fürnkranz, J. (2004). Fossil: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Lecture notes in artificial intelligence : Vol. 784. Proceedings of the 7th European conference on machine learning (pp. 122–137). ECML-94, Catania, Italy, 1994. Berlin: Springer. Google Scholar
  17. Fürnkranz, J. (1997). Pruning algorithms for rule learning. Machine Learning, 27(2), 139–171. CrossRefGoogle Scholar
  18. Fürnkranz, J., & Flach, P. (2004). An analysis of stopping and filtering criteria for rule learning. In J.-F. Boulicaut, F. Esposito, F. Giannotti, & D. Pedreschi (Eds.), Lecture notes in artificial intelligence : Vol. 3201. Proceedings of the 15th European conference on machine learning (pp. 123–133). ECML-04, Pisa, Italy, 2004. Berlin: Springer. Google Scholar
  19. Fürnkranz, J., & Flach, P. A. (2005). ROC ‘n’ rule learning—towards a better understanding of covering algorithms. Machine Learning, 58(1), 39–77. MATHCrossRefGoogle Scholar
  20. Fürnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In W. Cohen & H. Hirsh (Eds.), Proceedings of the 11th international conference on machine learning (pp. 70–77). ML-94, New Brunswick, NJ, 1994. San Mateo: Morgan Kaufmann. Google Scholar
  21. Holte, R., Acker, L., & Porter, B. (1989). Concept learning and the problem of small disjuncts. In Proceedings of the 11th international joint conference on artificial intelligence (pp. 813–818). IJCAI-89, Detroit, MI, 1989. San Mateo: Morgan Kaufmann. Google Scholar
  22. Janssen, F., & Fürnkranz, J. (2007). On meta-learning rule learning heuristics. In Proceedings of the 7th IEEE conference on data mining (pp. 529–534). ICDM-07, Omaha, NE, 2007. Google Scholar
  23. Janssen, F., & Fürnkranz, J. (2008). An empirical investigation of the trade-off between consistency and coverage in rule learning heuristics. In T. Horvath, J.-F. Boulicaut, & M. Berthold (Eds.), Proceedings of the 11th international conference on discovery science (pp. 40–51). DS-08, Budapest, Hungary, 2008. Berlin: Springer. Google Scholar
  24. Janssen, F., & Fürnkranz, J. (2009). A re-evaluation of the over-searching phenomenon in inductive rule learning. In Proceedings of the SIAM international conference on data mining (pp. 329–340). SDM-09, Sparks, NV, 2009. Google Scholar
  25. Klösgen, W. (1992). Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora. International Journal of Intelligent Systems, 7, 649–673. MATHCrossRefGoogle Scholar
  26. Lavrač, N., Flach, P., & Zupan, B. (1999). Rule evaluation measures: a unifying view. In S. Džeroski & P. Flach (Eds.), Proceedings of the 9th international workshop on inductive logic programming (ILP-99) (pp. 174–185). Berlin: Springer. Google Scholar
  27. Lavrač, N., Kavšek, B., Flach, P., & Todorovski, L. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–188. Google Scholar
  28. Lavrač, N., Cestnik, B., & Džeroski, S. (1992a). Search heuristics in empirical inductive logic programming. In Logical approaches to machine learning, workshop notes of the 10th European conference on AI, Vienna, Austria, 1992. Google Scholar
  29. Lavrač, N., Cestnik, B., & Džeroski, S. (1992b). Use of heuristics in empirical inductive logic programming. In S. H. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd international workshop on inductive logic programming (ILP-92), Number TM-1182 in ICOT Technical Memorandum, Tokyo, Japan, 1992. Institute for New Generation Computer Technology. Google Scholar
  30. Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th international symposium on information processing (pp. 125–128). Switching Circuits, Vol. A3, FCIP-69, Bled, Yugoslavia, 1969. Google Scholar
  31. Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342. Google Scholar
  32. Mozina, M., Demšar, J., Zabkar, J., & Bratko, I. (2006). Why is rule learning optimistic and how to correct it. In Machine learning: ECML 2006, 17th European conference on machine learning (pp. 330–340). Google Scholar
  33. Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3, 4), 245–286. Special issue on inductive logic programming. CrossRefGoogle Scholar
  34. Quinlan, J. (1996). Learning first-order definitions of functions. Journal of Artificial Intelligence Research, 5, 139–161. MATHGoogle Scholar
  35. Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266. Google Scholar
  36. Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine learning. An artificial intelligence approach (pp. 463–482). Palo Alto: Tioga. Google Scholar
  37. Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval. New York: McGraw-Hill. Google Scholar
  38. Scheffer, T. (2005). Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(3), 381–395. Google Scholar
  39. Tan, P.-N., Kumar, V., & Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 32–41). KDD-02, Edmonton, Alberta, 2002. Google Scholar
  40. Thiel, M. (2005). Separate and Conquer Framework und disjunktive Regeln. Master’s thesis, TU Darmstadt, 2005. In German (English title: Separate and conquer framework and disjunctive rules). Google Scholar
  41. Todorovski, L., Flach, P., & Lavrac, N. (2000). Predictive performance of weighted relative accuracy. In D. A. Zighed, J. Komorowski, & J. Zytkow (Eds.), 4th European conference on principles of data mining and knowledge discovery (PKDD2000) (pp. 255–264). Berlin: Springer. CrossRefGoogle Scholar
  42. Vapnik, V., Levin, E., & Cun, Y. L. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876. CrossRefGoogle Scholar
  43. Witten, I. H., & Frank, E. (2005). Data mining—practical machine learning tools and techniques with java implementations (2nd edn.). San Mateo: Morgan Kaufmann. http://www.cs.waikato.ac.nz/~ml/weka/. Google Scholar
  44. Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In J. Komorowski & J. Zytkow (Eds.), Proc. first European symposium on principles of data mining and knowledge discovery (pp. 78–87). PKDD-97, Berlin, 1997. Berlin: Springer. Google Scholar
  45. Wu, T., Chen, Y., & Han, J. (2007). Association mining in large databases: a re-examination of its measures. In Proceedings of the 11th European symposium on principles of data mining and knowledge discovery (pp. 621–628). PKDD-07, Warsaw, Poland, 2007. Berlin: Springer. Google Scholar
  46. Xiong, H., Shekhar, S., Tan, P.-N., & Kumar, V. (2004). Exploiting a support-based upper bound of Pearson’s correlation coefficient for efficiently identifying strongly correlated pairs. In Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 334–343). KDD-04, Seattle, USA, 2004. Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Technische Universität DarmstadtDarmstadtGermany

Personalised recommendations