Skip to main content

Rule Learning in a Nutshell

  • Chapter
  • First Online:
Foundations of Rule Learning

Part of the book series: Cognitive Technologies ((COGTECH))

Abstract

This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the material presented here and discuss advanced approaches, whereas this chapter only presents the core concepts. The chapter describes search heuristics and rule quality criteria, the basic covering algorithm, illustrates classification rule learning on simple propositional learning problems, shows how to use the learned rules for classifying new instances, and introduces the basic evaluation criteria and methodology for rule-set evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This chapter is partly based on (Flach & Lavrač, 2003).

  2. 2.

    The dataset is adapted from the well-known contact lenses dataset (Cendrowska, 1987; Witten & Frank, 2005).

  3. 3.

    If the term ‘top-down hill-climbing’ sounds contradictory: hill-climbing refers to the process of greedily moving towards a (local) optimum of the evaluation function, whereas top-down refers to the fact that the search space is searched by successively specializing the candidate rules, thereby moving downwards in the generalization hierarchy induced by the rules.

  4. 4.

    Beam search is a heuristic search algorithm that explores a graph by expanding just a limited set of the most promising nodes (cf. also Sect. 6.3.1).

  5. 5.

    Laplace will be defined in Sect. 2.7.

  6. 6.

    If C > 2 classes are used, the relative frequencies for each class should be estimated with \(\frac{\hat{{P}}_{i}+1} {{\sum\limits}_{j=1}^{C}\hat{{P}}_{j}+C}\), where \(\hat{{P}}_{i}\) is the number of examples of class i covered by the rule and C is the number of classes. However, if we estimate the probability whether an example that is covered by the body of a rule is also covered by its head or not, we have a binary distinction even in multiclass problems.

  7. 7.

    Clark and Niblett (1989, p. 269) define the likelihood ratio statistic in the form \(\mathit{LRS}(\mathbf{r}) = 2 \cdot {\sum\limits}_{i=1}^{C}\hat{{P}}_{i} {\cdot \log }_{2} \frac{\hat{{P}}_{i}} {\mathbb{E}\hat{{P}}_{i}}\), where \(\mathbb{E}\hat{{P}}_{i} = {\gamma }_{i} \cdot \hat{ E}\) is the expected number of examples of class c i that the rule would cover if the covered examples were distributed with the same relative class frequencies as in the original dataset. A simple transformation gives our formulation with ratios of observed relative frequencies and expected relative frequencies.

  8. 8.

    We assume that there are no contradictory examples in the training set, an assumption that does not always hold in practice, but the basic argument remains the same.

  9. 9.

    At the time of this writing, the collection contains 177 datasets in a great variety of differentdomains, includingbioinformatics,medical applications, financial prognosis, game playing, politics, and more.

  10. 10.

    The beam is called a star inAQ’s terminology, and the beam width is called the star size.

  11. 11.

    http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/

References

  • Bayardo, R. J., Jr. (1997). Brute-force mining of high-confidence classification rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97) (pp. 123–126). Menlo Park, CA: AAAI.

    Google Scholar 

  • Bergadano, F., Matwin, S., Michalski, R. S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The POSEIDON system. Machine Learning, 8, 5–43.

    Google Scholar 

  • Blockeel, H., & Vanschoren, J. (2007). Experiment databases: Towards an improved experimental methodology in machine learning. In J. N. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-07), Warsaw, Poland (pp. 6–17). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Bratko, I., & Muggleton, S. H. (1995). Applications of inductive logic programming. Communications of the ACM, 38(11), 65–70.

    Article  Google Scholar 

  • Bringmann, B., Nijssen, S., & Zimmermann, A. (2009). Pattern-based classification: A unifying perspective. In A. Knobbe & J. Fürnkranz (Eds.), From Local Patterns to Global Models: Proceedings of the ECML/PKDD-09 Workshop (LeGo-09), Bled, Slovenia (pp. 36–50).

    Google Scholar 

  • Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27, 349–370.

    Article  MATH  Google Scholar 

  • Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 151–163). Berlin, Germany: Springer.

    Google Scholar 

  • Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrač (Eds.), Progress in Machine Learning. Wilmslow, UK: Sigma Press.

    Google Scholar 

  • Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.

    Google Scholar 

  • Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (ML-95), Lake Tahoe, CA (pp. 115–123). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Cootes, A. P., Muggleton, S. H., & Sternberg, M. J. (2003). The automatic discovery of structural principles describing protein fold space. Journal of Molecular Biology, 330(4), 527–532.

    Article  Google Scholar 

  • De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & Zeugmann, T. (Eds.), Proceedings of the 5th Workshop on Algorithmic Learning Theory (ALT-95), Fukuoka, Japan (pp. 80–94). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Džeroski, S., & Bratko, I. (1992). Handling noise in inductive logic programming. In S. H. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92) (pp. 109–125). No. TM-1182 in ICOT Technical Memorandum, Institute for New Generation Computer Technology, Tokyo, Japan.

    Google Scholar 

  • Džeroski, S., Cestnik, B., & Petrovski, I. (1993). Using the m-estimate in rule induction. Journal of Computing and Information Technology, 1, 37–46.

    Google Scholar 

  • Flach, P., & Lavrač, N. (2003). Rule induction. In M. Berthold & D. J. Hand (Eds.), Intelligent data analysis (2nd ed., pp. 229–267). Berlin, Germany/New York: Springer.

    Chapter  Google Scholar 

  • Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.

    Google Scholar 

  • Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI (pp. 144–151). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Friedman, J. H., & Fisher, N. I. (1999). Bump hunting in high-dimensional data. Statistics and Computing, 9(2), 123–143.

    Article  Google Scholar 

  • Fürnkranz, J. (1994a). Fossil: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Proceedings of the 7th European Conference on Machine Learning (ECML-94), Catania, Italy (pp. 122–137). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Hühn, J., & Hüllermeier, E. (2009b). Furia: An algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3), 293–319.

    Article  MathSciNet  Google Scholar 

  • Joshi, S., Ramakrishnan, G., & Srinivasan, A. (2008). Feature construction using theory-guided sampling and randomised search. In F. Zelezný & N. Lavrac (Eds.), Proceedings of the 18th International Conference on Inductive Logic Programming (ILP-08), Prague, Czech Republic (pp. 140–157). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Jovanoski, V., & Lavrač, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil & A. Jorge (Eds.), Proceedings of the 10th Portuguese Conference on Artificial Intelligence (EPIA 2001), Porto, Portugal (pp. 44–51). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Kaufman, K. A., & Michalski, R. S. (2000). An adjustable rule learner for pattern discovery using the AQ methodology. Journal of Intelligent Information Systems, 14, 199–216.

    Article  Google Scholar 

  • King, R. D., Whelan, K. E., Jones, F. M., Reiser, P., Bryant, C., & Muggleton, S., et al. (2004). Functional genomic hypothesis generation and experimentation by a robot. Nature, 427, 247–252.

    Article  Google Scholar 

  • Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 262–291). Berlin, Germany: Springer.

    Google Scholar 

  • Lavrač, N., & Džeroski, S. (1994a). Inductive logic programming: Techniques and applications. New York: Ellis Horwood.

    MATH  Google Scholar 

  • Lavrač, N., Džeroski, S., & Grobelnik, M. (1991). Learning nonrecursive definitions of relations with LINUS. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 265–281). Berlin, Germany: Springer.

    Google Scholar 

  • Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the IEEE Conference on Data Mining (ICDM-01), San Jose, CA (pp. 369–376). Los Alamitos, CA: IEEE.

    Google Scholar 

  • Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 80–86). Menlo Park, CA: AAAI.

    Google Scholar 

  • Liu, B., Ma, Y., & Wong, C.-K. (2000). Improving an exhaustive search based rule learner. In D. A. Zighed, H. J. Komorowski, & J. M. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France (pp. 504–509). Berlin, Germany: Springer.

    Google Scholar 

  • Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP-69), Bled, Yugoslavia (Switching circuits, Vol. A3, pp. 125–128).

    Google Scholar 

  • Michalski, R. S. (1980). Pattern recognition and rule-guided inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 349–361.

    Article  MATH  Google Scholar 

  • Michalski, R. S., & Larson, J. B. (1978). Selection of most representative training examples and incremental generation of VL1 hypotheses: the underlying methodology and the description of programs ESEL and AQ11 (Tech. Rep. 78-867). Department of Computer Science, University of Illinois at Urbana-Champaign.

    Google Scholar 

  • Michalski, R. S., Mozetič, I., Hong, J., & Lavrač, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86), Philadelphia (pp. 1041–1045). Menlo Park, CA: AAAI.

    Google Scholar 

  • Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226.

    Article  MathSciNet  Google Scholar 

  • Mitchell, T. M. (1997). Machine learning. New York: McGraw Hill.

    MATH  Google Scholar 

  • Mooney, R. J. (1995). Encouraging experimental results on learning CNF. Machine Learning, 19, 79–92.

    Google Scholar 

  • Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3,4), 245–286. Special Issue on Inductive Logic Programming.

    Google Scholar 

  • Muggleton, S. H., & Firth, J. (2001). Relational rule induction with CProgol4.4: A tutorial introduction. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 160–188). Berlin, Germany: Springer. Chap. 7.

  • Mutter, S., Hall, M., & Frank, E. (2004). Using classification to evaluate the output of confidence-based association rule mining. In G. I. Webb & X. Yu (Eds.), Proceedings of the Australian Joint Conference on Artificial Intelligence (AI-04), Cairns, QLD (pp. 538–549). Berlin, Germany: Springer.

    Google Scholar 

  • Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–99.

    Article  Google Scholar 

  • Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell, (Eds.), Machine learning. An artificial intelligence approach (pp. 463–482). Palo Alto, CA: Tioga.

    Google Scholar 

  • Quinlan, J. R. (1987a). Generating production rules from decision trees. In Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI-87) (pp. 304–307). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.

    Google Scholar 

  • Quinlan, J. R. (1991). Determinate literals in inductive logic programming. In Proceedings of the 8th International Workshop on Machine Learning (ML-91) (pp. 442–446). San Mateo, CA: Morgan Kaufmann

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R., & Cameron-Jones, R. M. (1995a). Induction of logic programs: FOIL and related systems. New Generation Computing, 13(3,4), 287–312. Special Issue on Inductive Logic Programming.

    Google Scholar 

  • Ramakrishnan, G., Joshi, S., Balakrishnan, S., & Srinivasan, A. (2008). Using ILP to construct features for information extraction from semi-structured text. In H. Blockeel, J. Ramon, J. W. Shavlik, & P. Tadepalli (Eds.), Proceedings of the 17th International Conference on Inductive Logic Programming (ILP-07), Corvallis, OR (pp. 211–224). Springer.

    Google Scholar 

  • Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2, 229–246.

    MathSciNet  Google Scholar 

  • Soares, C. (2003). Is the UCI repository useful for data mining? In F. Moura-Pires & S. Abreu (Eds.), Proceedings of the 11th Portuguese Conference on Artificial Intelligence (EPIA-03), Beja, Portugal (pp. 209–223). Berlin, Germany/Heidelberg, Germany: Springer.

    Google Scholar 

  • Specia, L., Srinivasan, A., Joshi, S., Ramakrishnan, G., & das Graças Volpe Nunes, M. (2009). An investigation into feature construction to assist word sense disambiguation. Machine Learning, 76(1), 109–136.

    Google Scholar 

  • Srinivasan, A. (1999). The Aleph manual. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.

  • Srinivasan, A., & King, R. D. (1997). Feature construction with inductive logic programming: A study of quantitative predictions of biological activity by structural attributes. In S. Muggleton (Ed.), Proceedings of the 6th International Workshop, on Inductive Logic Programming (ILP-96), Stockholm (pp. 89–104). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Sternberg, M. J., & Muggleton, S. H. (2003). Structure activity relationships (SAR) and pharmacophore discovery using inductive logic programming (ILP). QSAR and Combinatorial Science, 22(5), 527–532.

    Article  Google Scholar 

  • Theron, H., & Cloete, I. (1996). BEXA: A covering algorithm for learning propositional concept descriptions. Machine Learning, 24, 5–40.

    Google Scholar 

  • Webb, G. I. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 5, 431–465.

    Google Scholar 

  • Webb, G. I. (2000). Efficient search for association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), Boston (pp. 99–107). New York: ACM.

    Google Scholar 

  • Webb, G. I., & Zhang, S. (2005). k-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1), 39–79.

    Google Scholar 

  • Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques with Java implementations (2nd ed.). Amsterdam/Boston: Morgan Kaufmann Publishers.

    Google Scholar 

  • Yin, X., & Han, J. (2003). CPAR: Classification based on predictive association rules. In D. Barbará & C. Kamath (Eds.) Proceedings of the SIAM Conference on Data Mining (SDM-03) (pp. 331–335). Philadelphia: SIAM.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fürnkranz, J., Gamberger, D., Lavrač, N. (2012). Rule Learning in a Nutshell. In: Foundations of Rule Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75197-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75197-7_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75196-0

  • Online ISBN: 978-3-540-75197-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics