Rule Learning in a Nutshell

Fürnkranz, Johannes; Gamberger, Dragan; Lavrač, Nada

doi:10.1007/978-3-540-75197-7_2

Johannes Fürnkranz⁴,
Dragan Gamberger⁵ &
Nada Lavrač⁶

Part of the book series: Cognitive Technologies ((COGTECH))

2298 Accesses
3 Citations

Abstract

This chapter gives a brief overview of inductive rule learning and may therefore serve as a guide through the rest of the book. Later chapters will expand upon the material presented here and discuss advanced approaches, whereas this chapter only presents the core concepts. The chapter describes search heuristics and rule quality criteria, the basic covering algorithm, illustrates classification rule learning on simple propositional learning problems, shows how to use the learned rules for classifying new instances, and introduces the basic evaluation criteria and methodology for rule-set evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.95; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This chapter is partly based on (Flach & Lavrač, 2003).
2.
The dataset is adapted from the well-known contact lenses dataset (Cendrowska, 1987; Witten & Frank, 2005).
3.
If the term ‘top-down hill-climbing’ sounds contradictory: hill-climbing refers to the process of greedily moving towards a (local) optimum of the evaluation function, whereas top-down refers to the fact that the search space is searched by successively specializing the candidate rules, thereby moving downwards in the generalization hierarchy induced by the rules.
4.
Beam search is a heuristic search algorithm that explores a graph by expanding just a limited set of the most promising nodes (cf. also Sect. 6.3.1).
5.
Laplace will be defined in Sect. 2.7.
6.
If C > 2 classes are used, the relative frequencies for each class should be estimated with \(\frac{\hat{{P}}_{i}+1} {{\sum\limits}_{j=1}^{C}\hat{{P}}_{j}+C}\), where \(\hat{{P}}_{i}\) is the number of examples of class i covered by the rule and C is the number of classes. However, if we estimate the probability whether an example that is covered by the body of a rule is also covered by its head or not, we have a binary distinction even in multiclass problems.
7.
Clark and Niblett (1989, p. 269) define the likelihood ratio statistic in the form \(\mathit{LRS}(\mathbf{r}) = 2 \cdot {\sum\limits}_{i=1}^{C}\hat{{P}}_{i} {\cdot \log }_{2} \frac{\hat{{P}}_{i}} {\mathbb{E}\hat{{P}}_{i}}\), where \(\mathbb{E}\hat{{P}}_{i} = {\gamma }_{i} \cdot \hat{ E}\) is the expected number of examples of class c _i that the rule would cover if the covered examples were distributed with the same relative class frequencies as in the original dataset. A simple transformation gives our formulation with ratios of observed relative frequencies and expected relative frequencies.
8.
We assume that there are no contradictory examples in the training set, an assumption that does not always hold in practice, but the basic argument remains the same.
9.
At the time of this writing, the collection contains 177 datasets in a great variety of differentdomains, includingbioinformatics,medical applications, financial prognosis, game playing, politics, and more.
10.
The beam is called a star inAQ’s terminology, and the beam width is called the star size.
11.
http://www.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/

References

Bayardo, R. J., Jr. (1997). Brute-force mining of high-confidence classification rules. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD-97) (pp. 123–126). Menlo Park, CA: AAAI.
Google Scholar
Bergadano, F., Matwin, S., Michalski, R. S., & Zhang, J. (1992). Learning two-tiered descriptions of flexible concepts: The POSEIDON system. Machine Learning, 8, 5–43.
Google Scholar
Blockeel, H., & Vanschoren, J. (2007). Experiment databases: Towards an improved experimental methodology in machine learning. In J. N. Kok, J. Koronacki, R. L. de Mántaras, S. Matwin, D. Mladenic, & A. Skowron (Eds.), Proceedings of the 11th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-07), Warsaw, Poland (pp. 6–17). Berlin, Germany/New York: Springer.
Google Scholar
Bratko, I., & Muggleton, S. H. (1995). Applications of inductive logic programming. Communications of the ACM, 38(11), 65–70.
Article Google Scholar
Bringmann, B., Nijssen, S., & Zimmermann, A. (2009). Pattern-based classification: A unifying perspective. In A. Knobbe & J. Fürnkranz (Eds.), From Local Patterns to Global Models: Proceedings of the ECML/PKDD-09 Workshop (LeGo-09), Bled, Slovenia (pp. 36–50).
Google Scholar
Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27, 349–370.
Article MATH Google Scholar
Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 151–163). Berlin, Germany: Springer.
Google Scholar
Clark, P., & Niblett, T. (1987). Induction in noisy domains. In I. Bratko & N. Lavrač (Eds.), Progress in Machine Learning. Wilmslow, UK: Sigma Press.
Google Scholar
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3(4), 261–283.
Google Scholar
Cohen, W. W. (1995). Fast effective rule induction. In A. Prieditis & S. Russell (Eds.), Proceedings of the 12th International Conference on Machine Learning (ML-95), Lake Tahoe, CA (pp. 115–123). San Francisco: Morgan Kaufmann.
Google Scholar
Cootes, A. P., Muggleton, S. H., & Sternberg, M. J. (2003). The automatic discovery of structural principles describing protein fold space. Journal of Molecular Biology, 330(4), 527–532.
Article Google Scholar
De Raedt, L., & Van Laer, W. (1995). Inductive constraint logic. In K. Jantke, T. Shinohara, & Zeugmann, T. (Eds.), Proceedings of the 5th Workshop on Algorithmic Learning Theory (ALT-95), Fukuoka, Japan (pp. 80–94). Berlin, Germany/New York: Springer.
Google Scholar
Džeroski, S., & Bratko, I. (1992). Handling noise in inductive logic programming. In S. H. Muggleton & K. Furukawa (Eds.), Proceedings of the 2nd International Workshop on Inductive Logic Programming (ILP-92) (pp. 109–125). No. TM-1182 in ICOT Technical Memorandum, Institute for New Generation Computer Technology, Tokyo, Japan.
Google Scholar
Džeroski, S., Cestnik, B., & Petrovski, I. (1993). Using the m-estimate in rule induction. Journal of Computing and Information Technology, 1, 37–46.
Google Scholar
Flach, P., & Lavrač, N. (2003). Rule induction. In M. Berthold & D. J. Hand (Eds.), Intelligent data analysis (2nd ed., pp. 229–267). Berlin, Germany/New York: Springer.
Chapter Google Scholar
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.
Google Scholar
Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI (pp. 144–151). San Francisco: Morgan Kaufmann.
Google Scholar
Friedman, J. H., & Fisher, N. I. (1999). Bump hunting in high-dimensional data. Statistics and Computing, 9(2), 123–143.
Article Google Scholar
Fürnkranz, J. (1994a). Fossil: A robust relational learner. In F. Bergadano & L. De Raedt (Eds.), Proceedings of the 7th European Conference on Machine Learning (ECML-94), Catania, Italy (pp. 122–137). Berlin, Germany/New York: Springer.
Google Scholar
Hühn, J., & Hüllermeier, E. (2009b). Furia: An algorithm for unordered fuzzy rule induction. Data Mining and Knowledge Discovery, 19(3), 293–319.
Article MathSciNet Google Scholar
Joshi, S., Ramakrishnan, G., & Srinivasan, A. (2008). Feature construction using theory-guided sampling and randomised search. In F. Zelezný & N. Lavrac (Eds.), Proceedings of the 18th International Conference on Inductive Logic Programming (ILP-08), Prague, Czech Republic (pp. 140–157). Berlin, Germany/New York: Springer.
Google Scholar
Jovanoski, V., & Lavrač, N. (2001). Classification rule learning with APRIORI-C. In P. Brazdil & A. Jorge (Eds.), Proceedings of the 10th Portuguese Conference on Artificial Intelligence (EPIA 2001), Porto, Portugal (pp. 44–51). Berlin, Germany/New York: Springer.
Google Scholar
Kaufman, K. A., & Michalski, R. S. (2000). An adjustable rule learner for pattern discovery using the AQ methodology. Journal of Intelligent Information Systems, 14, 199–216.
Article Google Scholar
King, R. D., Whelan, K. E., Jones, F. M., Reiser, P., Bryant, C., & Muggleton, S., et al. (2004). Functional genomic hypothesis generation and experimentation by a robot. Nature, 427, 247–252.
Article Google Scholar
Kramer, S., Lavrač, N., & Flach, P. (2001). Propositionalization approaches to relational data mining. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 262–291). Berlin, Germany: Springer.
Google Scholar
Lavrač, N., & Džeroski, S. (1994a). Inductive logic programming: Techniques and applications. New York: Ellis Horwood.
MATH Google Scholar
Lavrač, N., Džeroski, S., & Grobelnik, M. (1991). Learning nonrecursive definitions of relations with LINUS. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 265–281). Berlin, Germany: Springer.
Google Scholar
Li, W., Han, J., & Pei, J. (2001). CMAR: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the IEEE Conference on Data Mining (ICDM-01), San Jose, CA (pp. 369–376). Los Alamitos, CA: IEEE.
Google Scholar
Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In R. Agrawal, P. Stolorz, & G. Piatetsky-Shapiro (Eds.), Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD-98) (pp. 80–86). Menlo Park, CA: AAAI.
Google Scholar
Liu, B., Ma, Y., & Wong, C.-K. (2000). Improving an exhaustive search based rule learner. In D. A. Zighed, H. J. Komorowski, & J. M. Zytkow (Eds.), Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD-2000), Lyon, France (pp. 504–509). Berlin, Germany: Springer.
Google Scholar
Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP-69), Bled, Yugoslavia (Switching circuits, Vol. A3, pp. 125–128).
Google Scholar
Michalski, R. S. (1980). Pattern recognition and rule-guided inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 349–361.
Article MATH Google Scholar
Michalski, R. S., & Larson, J. B. (1978). Selection of most representative training examples and incremental generation of VL1 hypotheses: the underlying methodology and the description of programs ESEL and AQ11 (Tech. Rep. 78-867). Department of Computer Science, University of Illinois at Urbana-Champaign.
Google Scholar
Michalski, R. S., Mozetič, I., Hong, J., & Lavrač, N. (1986). The multi-purpose incremental learning system AQ15 and its testing application to three medical domains. In Proceedings of the 5th National Conference on Artificial Intelligence (AAAI-86), Philadelphia (pp. 1041–1045). Menlo Park, CA: AAAI.
Google Scholar
Mitchell, T. M. (1982). Generalization as search. Artificial Intelligence, 18(2), 203–226.
Article MathSciNet Google Scholar
Mitchell, T. M. (1997). Machine learning. New York: McGraw Hill.
MATH Google Scholar
Mooney, R. J. (1995). Encouraging experimental results on learning CNF. Machine Learning, 19, 79–92.
Google Scholar
Muggleton, S. H. (1995). Inverse entailment and Progol. New Generation Computing, 13(3,4), 245–286. Special Issue on Inductive Logic Programming.
Google Scholar
Muggleton, S. H., & Firth, J. (2001). Relational rule induction with CProgol4.4: A tutorial introduction. In S. Džeroski & N. Lavrač (Eds.), Relational data mining (pp. 160–188). Berlin, Germany: Springer. Chap. 7.
Mutter, S., Hall, M., & Frank, E. (2004). Using classification to evaluate the output of confidence-based association rule mining. In G. I. Webb & X. Yu (Eds.), Proceedings of the Australian Joint Conference on Artificial Intelligence (AI-04), Cairns, QLD (pp. 538–549). Berlin, Germany: Springer.
Google Scholar
Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine Learning, 5, 71–99.
Article Google Scholar
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end games. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell, (Eds.), Machine learning. An artificial intelligence approach (pp. 463–482). Palo Alto, CA: Tioga.
Google Scholar
Quinlan, J. R. (1987a). Generating production rules from decision trees. In Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI-87) (pp. 304–307). Los Altos, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R. (1990). Learning logical definitions from relations. Machine Learning, 5, 239–266.
Google Scholar
Quinlan, J. R. (1991). Determinate literals in inductive logic programming. In Proceedings of the 8th International Workshop on Machine Learning (ML-91) (pp. 442–446). San Mateo, CA: Morgan Kaufmann
Google Scholar
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Quinlan, J. R., & Cameron-Jones, R. M. (1995a). Induction of logic programs: FOIL and related systems. New Generation Computing, 13(3,4), 287–312. Special Issue on Inductive Logic Programming.
Google Scholar
Ramakrishnan, G., Joshi, S., Balakrishnan, S., & Srinivasan, A. (2008). Using ILP to construct features for information extraction from semi-structured text. In H. Blockeel, J. Ramon, J. W. Shavlik, & P. Tadepalli (Eds.), Proceedings of the 17th International Conference on Inductive Logic Programming (ILP-07), Corvallis, OR (pp. 211–224). Springer.
Google Scholar
Rivest, R. L. (1987). Learning decision lists. Machine Learning, 2, 229–246.
MathSciNet Google Scholar
Soares, C. (2003). Is the UCI repository useful for data mining? In F. Moura-Pires & S. Abreu (Eds.), Proceedings of the 11th Portuguese Conference on Artificial Intelligence (EPIA-03), Beja, Portugal (pp. 209–223). Berlin, Germany/Heidelberg, Germany: Springer.
Google Scholar
Specia, L., Srinivasan, A., Joshi, S., Ramakrishnan, G., & das Graças Volpe Nunes, M. (2009). An investigation into feature construction to assist word sense disambiguation. Machine Learning, 76(1), 109–136.
Google Scholar
Srinivasan, A. (1999). The Aleph manual. http://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/.
Srinivasan, A., & King, R. D. (1997). Feature construction with inductive logic programming: A study of quantitative predictions of biological activity by structural attributes. In S. Muggleton (Ed.), Proceedings of the 6th International Workshop, on Inductive Logic Programming (ILP-96), Stockholm (pp. 89–104). Berlin, Germany/New York: Springer.
Google Scholar
Sternberg, M. J., & Muggleton, S. H. (2003). Structure activity relationships (SAR) and pharmacophore discovery using inductive logic programming (ILP). QSAR and Combinatorial Science, 22(5), 527–532.
Article Google Scholar
Theron, H., & Cloete, I. (1996). BEXA: A covering algorithm for learning propositional concept descriptions. Machine Learning, 24, 5–40.
Google Scholar
Webb, G. I. (1995). OPUS: An efficient admissible algorithm for unordered search. Journal of Artificial Intelligence Research, 5, 431–465.
Google Scholar
Webb, G. I. (2000). Efficient search for association rules. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), Boston (pp. 99–107). New York: ACM.
Google Scholar
Webb, G. I., & Zhang, S. (2005). k-optimal rule discovery. Data Mining and Knowledge Discovery, 10(1), 39–79.
Google Scholar
Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques with Java implementations (2nd ed.). Amsterdam/Boston: Morgan Kaufmann Publishers.
Google Scholar
Yin, X., & Han, J. (2003). CPAR: Classification based on predictive association rules. In D. Barbará & C. Kamath (Eds.) Proceedings of the SIAM Conference on Data Mining (SDM-03) (pp. 331–335). Philadelphia: SIAM.
Google Scholar

Download references

Author information

Authors and Affiliations

FB Informatik, TU Darmstadt, Darmstadt, Germany
Johannes Fürnkranz
Rudjer Bošković Institute, Zagreb, Croatia
Dragan Gamberger
Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Nada Lavrač

Authors

Johannes Fürnkranz
View author publications
You can also search for this author in PubMed Google Scholar
Dragan Gamberger
View author publications
You can also search for this author in PubMed Google Scholar
Nada Lavrač
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fürnkranz, J., Gamberger, D., Lavrač, N. (2012). Rule Learning in a Nutshell. In: Foundations of Rule Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75197-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-75197-7_2
Published: 27 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75196-0
Online ISBN: 978-3-540-75197-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics