Skip to main content

Beyond Concept Learning

  • Chapter
  • First Online:
Foundations of Rule Learning

Part of the book series: Cognitive Technologies ((COGTECH))

Abstract

So far, we have mostly assumed a concept learning framework, where the learner’s task is to learn a rule set describing the target concept from a set of positive and negative examples for this concept. In this chapter, we discuss approaches that allow to extend this framework. We start with multiclass problems, which commonly occur in practice, and discuss the most popular methods for handling them: one-against-all classification and pairwise classification. We also discuss error-correcting output codes as a general framework for reducing multiclass problems to binary classification. As many prediction problems have complex, structured output variables, we also present label ranking and show how a generalization of pairwise classification can address this problem and related problems such as multilabel, hierarchical, and ordered classification. General ranking problems, in particular methods for optimizing the area under the ROC curve, are also addressed in this section. Finally, we briefly review rule learning approaches to regression and clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Parts of this chapter are based on Fürnkranz (2002b), Park and Fürnkranz (2009) and Fürnkranz and Hüllermeier (2010b).

  2. 2.

    Recall that \(\hat{{\pi }}_{i} = \frac{\hat{{P}}_{i}} {\hat{E}}\) is the proportion of covered examples of class c i .

  3. 3.

    We here refer to William Cohen’s original C-implementation of the algorithm. At the time of this writing, JRip, the more accessible Weka re-implementation of Ripper, does not support these options.

  4. 4.

    Stacking (Wolpert, 1992) denotes a family of techniques that use the predictions of a set of classifiers as inputs for a meta-level classifier that makes the final prediction.

  5. 5.

    Bagging (Breiman, 1996) is a popular ensemble technique which trains a set of classifiers, each on a sample of the training data that was generated by sampling uniformly and with replacement. The predictions of these classifiers are then combined, which often yields a better practical performance than using the predictions of a single classifier.

  6. 6.

    http://www.rulequest.com/cubist-info.html

References

  • Ali, K. M., & Pazzani, M. J. (1993). HYDRA: A noise-tolerant relational concept learning algorithm. In R. Bajcsy (Ed.), Proceedings of the 13th Joint International Conference on Artificial Intelligence (IJCAI-93), Chambéry, France (pp. 1064–1071). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1, 113–141.

    MathSciNet  Google Scholar 

  • Bisson, G. (1992). Conceptual clustering in a first order logic representation. In B. Neumann (Ed.), Proceedings of the 10th European Conference on Artificial Intelligence (ECAI-92), Vienna (pp. 458–462). Chichester, UK/New York: Wiley.

    Google Scholar 

  • Blaszczynski, J., Stefanowski, J., & Zajac, M. (2009). Ensembles of abstaining classifiers based on rule sets. In J. Rauch, Z. W. Ras, P. Berka, & T. Elomaa (Eds.), Proceedings of the 18th International Symposium on Foundations of Intelligent Systems (ISMIS-09), Prague, Czech Republic (pp. 382–391). Berlin, Germany: Springer.

    Google Scholar 

  • Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning, Madison, WI (pp. 55–63). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Bose, R. C., & Ray Chaudhuri, D. K. (1960). On a class of error correcting binary group codes. Information and Control, 3(1), 68–79.

    Article  MathSciNet  MATH  Google Scholar 

  • Boström, H. (2007). Maximizing the area under the ROC curve with decision lists and rule sets. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM-07), Minneapolis, MN (pp. 27–34). Philadelphia: SIAM.

    Google Scholar 

  • Bradley, R. A., & Terry, M. E. (1952). The rank analysis of incomplete block designs—I. The method of paired comparisons. Biometrika, 39, 324–345.

    Google Scholar 

  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth & Brooks.

    MATH  Google Scholar 

  • Cardoso, J. S., & da Costa, J. F. P. (2007). Learning to classify ordinal data: The data replication method. Journal of Machine Learning Research, 8, 1393–1429.

    MATH  Google Scholar 

  • Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements. In Proceedings of the 5th European Working Session on Learning (EWSL-91), Porto, Portugal (pp. 151–163). Berlin, Germany: Springer.

    Google Scholar 

  • Cohen, W. W., Schapire, R. E., & Singer, Y. (1999). Learning to order things. Journal of Artificial Intelligence Research, 10, 243–270.

    MathSciNet  MATH  Google Scholar 

  • Cook, D. J., & Holder, L. B. (1994). Substructure discovery using minimum description length and background knowledge. Journal of Artificial Intelligence Research, 1, 231–255.

    Google Scholar 

  • Crammer, K., & Singer, Y. (2002). On the learnability and design of output codes for multiclass problems. Machine Learning, 47(2–3), 201–233.

    Article  MATH  Google Scholar 

  • Davis, J., Burnside, E., Castro Dutra, I. d., Page, D., & Santos Costa, V. (2004). Using Bayesian classifiers to combine rules. In Proceedings of the 3rd SIGKDD Workshop on Multi-Relational Data Mining (MRDM-04), Seattle, WA.

    Google Scholar 

  • Dekel, O., Manning, C. D., & Singer, Y. (2004). Log-linear models for label ranking. In S. Thrun, L. K. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems (NIPS-03) (pp. 497–504). Cambridge, MA: MIT.

    Google Scholar 

  • Dembczyński, K., Kotłowski, W., & Słowiński, R. (2008). Solving regression by learning an ensemble of decision rules. In L. Rutkowski, R. Tadeusiewicz, L. A. Zadeh, & J. M. Zurada (Eds.), Proceedings of the 9th International Conference on Artificial Intelligence and Soft Computing (ICAISC-08), Zakopane, Poland (pp. 533–544). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Dietterich, T. G., & Bakiri, G. (1995). Solving multiclass learning problems via error-correcting output codes. Journal of Artificial Intelligence Research, 2, 263–286.

    MATH  Google Scholar 

  • Eineborg, M., & Boström, H. (2001). Classifying uncovered examples by rule stretching. In C. Rouveirol & M. Sebag (Eds.), Proceedings of the Eleventh International Conference on Inductive Logic Programming (ILP-01), Strasbourg, France (pp. 41–50). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Escalera, S., Pujol, O., & Radeva, P. (2006). Decoding of ternary error correcting output codes. In J. F. M. Trinidad, J. A. Carrasco-Ochoa, & J. Kittler (Eds.), Proceedings of the 11th Iberoamerican Congress in Pattern Recognition (CIARP-06), Cancun, Mexico (pp. 753–763). Berlin, Germany/Heidelberg, Germany/New York: Springer.

    Google Scholar 

  • Fawcett, T. E. (2001). Using rule sets to maximize ROC performance. In Proceedings of the IEEE International Conference on Data Mining (ICDM-01), San Jose, CA (pp. 131–138). Los Alamitos, CA: IEEE.

    Google Scholar 

  • Fawcett, T. E. (2008). PRIE: A system for generating rulelists to maximize ROC performance. Data Mining and Knowledge Discovery, 17(2), 207–224.

    Article  MathSciNet  Google Scholar 

  • Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2), 139–172.

    Google Scholar 

  • Fodor, J., & Roubens, M. (1994). Fuzzy preference modelling and multicriteria decision support. Dordrecht, The Netherlands/Boston: Kluwer.

    MATH  Google Scholar 

  • Frank, A., & Asuncion, A. (2010). UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science.

    Google Scholar 

  • Frank, E., & Hall, M. (2001). A simple approach to ordinal classification. In L. D. Raedt & P. Flach (Eds.), Proceedings of the 12th European Conference on Machine Learning (ECML-01), Freiburg, Germany (pp. 145–156). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Frank, E., & Witten, I. H. (1998). Generating accurate rule sets without global optimization. In J. Shavlik (Ed.), Proceedings of the 15th International Conference on Machine Learning (ICML-98), Madison, WI (pp. 144–151). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Friedman, J. H. (1996). Another approach to polychotomous classification (Tech. rep.). Stanford, CA: Department of Statistics, Stanford University.

    Google Scholar 

  • Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian networks classifiers. Machine Learning, 29, 131–161.

    Article  MATH  Google Scholar 

  • Friedman, J. H., & Popescu, B. E. (2008). Predictive learning via rule ensembles. Annals of Applied Statistics, 2, 916–954.

    Article  MathSciNet  MATH  Google Scholar 

  • Fürnkranz, J. (2002b). Round robin classification. Journal of Machine Learning Research, 2, 721–747.

    MATH  Google Scholar 

  • Fürnkranz, J. (2003). Round robin ensembles. Intelligent Data Analysis, 7(5), 385–404.

    Google Scholar 

  • Fürnkranz, J., & Flach, P. (2005). ROC ’n’ rule learning – Towards a better understanding of covering algorithms. Machine Learning, 58(1), 39–77.

    Article  MATH  Google Scholar 

  • Fürnkranz, J., & Hüllermeier, E. (2003). Pairwise preference learning and ranking. In N. Lavrač, D. Gamberger, H. Blockeel, & L. Todorovski (Eds.), Proceedings of the 14th European Conference on Machine Learning (ECML-03), Cavtat, Croatia (pp. 145–156). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Fürnkranz, J., & Hüllermeier, E. (Eds.). (2010a). Preference learning. Heidelberg, Germany/New York: Springer.

    MATH  Google Scholar 

  • Fürnkranz, J., & Hüllermeier, E. (2010b). Preference learning and ranking by pairwise comparison. In J. Fürnkranz & E. Hüllermeier (Eds.), Preference learning (pp. 65–82). Heidelberg, Germany/New York: Springer.

    Chapter  Google Scholar 

  • Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153.

    Article  Google Scholar 

  • Fürnkranz, J., Hüllermeier, E., & Vanderlooy, S. (2009). Binary decomposition methods for multipartite ranking. In W. L. Buntine, M. Grobelnik, D. Mladenić, & J. Shawe-Taylor (Eds.), Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD-09), Bled, Slovenia (Vol. Part I, pp. 359–374). Berlin, Germany: Springer.

    Google Scholar 

  • Fürnkranz, J., & Sima, J. F. (2010). On exploiting hierarchical label structure with pairwise classifiers. SIGKDD Explorations, 12(2), 21–25. Special Issue on Mining Unexpected Results.

    Google Scholar 

  • Gamberger, D., Lavrač, N., & Krstačić, G. (2002). Confirmation rule induction and its applications to coronary heart disease diagnosis and risk group discovery. Journal of Intelligent and Fuzzy Systems, 12(1), 35–48.

    MATH  Google Scholar 

  • Ghani, R. (2000). Using error-correcting codes for text classification. In Proceedings of the 17th International Conference on Machine Learning (ICML-00) (pp. 303–310). San Francisco: Morgan Kaufmann Publishers.

    Google Scholar 

  • Gönen, M., & Heller, G. (2005). Concordance probability and discriminatory power in proportional hazards regression. Biometrika, 92(4), 965–970.

    Article  MathSciNet  MATH  Google Scholar 

  • Har-Peled, S., Roth, D., & Zimak, D. (2002). Constraint classification: A new approach to multiclass classification. In N. Cesa-Bianchi, M. Numao, & R. Reischuk (Eds.), Proceedings of the 13th International Conference on Algorithmic Learning Theory (ALT-02), Lübeck, Germany (pp. 365–379). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Hastie, T., & Tibshirani, R. (1998). Classification by pairwise coupling. In M. Jordan, M. Kearns, & S. Solla (Eds.), Advances in neural information processing systems 10 (NIPS-97) (pp. 507–513). Cambridge, MA: MIT.

    Google Scholar 

  • Hocquenghem, A. (1959). Codes correcteurs d’erreurs. Chiffres, 2, 147–156. In French.

    Google Scholar 

  • Holmes, G., Hall, M., & Frank, E. (1999). Generating rule sets from model trees. In N. Y. Foo (Ed.), Proceedings of the 12th Australian Joint Conference on Artificial Intelligence (AI-99), Sydney, Australia (pp. 1–12). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Hsu, C.-W., & Lin, C.-J. (2002). A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.

    Article  Google Scholar 

  • Hühn, J., & Hüllermeier, E. (2009a). FR3: A fuzzy rule learner for inducing reliable classifiers. IEEE Transactions on Fuzzy Systems, 17(1), 138–149.

    Article  Google Scholar 

  • Hüllermeier, E., & Fürnkranz, J. (2010). On predictive accuracy and risk minimization in pairwise label ranking. Journal of Computer and System Sciences, 76(1), 49–62.

    Article  MathSciNet  MATH  Google Scholar 

  • Hüllermeier, E., Fürnkranz, J., Cheng, W., & Brinker, K. (2008). Label ranking by learning pairwise preferences. Artificial Intelligence, 172, 1897–1916.

    Article  MathSciNet  MATH  Google Scholar 

  • Janssen, F., & Fürnkranz, J. (2011). Heuristic rule-based regression via dynamic reduction to classification. In T. Walsh (Ed.), Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI-11), Barcelona, Spain (pp. 1330–1335). Menlo Park, CA: AAAI.

    Google Scholar 

  • Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02), Edmonton, AB (pp. 133–142). New York: ACM.

    Google Scholar 

  • Joachims, T. (2006). Training linear SVMs in linear time. In T. Eliassi-Rad, L. H. Ungar, M. Craven, & D. Gunopulos (Eds.), Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), Philadelphia (pp. 217–226). New York: ACM.

    Google Scholar 

  • Karalič, A., & Bratko, I. (1997). First order regression. Machine Learning, 26(2/3), 147–176. Special Issue on Inductive Logic Programming.

    Google Scholar 

  • Kittler, J., Ghaderi, R., Windeatt, T., & Matas, J. (2003). Face verification via error correcting output codes. Image and Vision Computing, 21(13–14), 1163–1169.

    Article  Google Scholar 

  • Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. In F. Fogelman Soulié & J. Hérault (Eds.), Neurocomputing: Algorithms, architectures and applications (NATO ASI Series, Vol. F68, pp. 41–50). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Knerr, S., Personnaz, L., & Dreyfus, G. (1992). Handwritten digit recognition by neural networks with single-layer training. IEEE Transactions on Neural Networks, 3(6), 962–968.

    Article  Google Scholar 

  • Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML-97), Nashville, TN (pp. 170–178). San Francisco: Morgan Kaufmann Publishers

    Google Scholar 

  • Kong, E. B., & Dietterich, T. G. (1995). Error-correcting output coding corrects bias and variance. In Proceedings of the 12th International Conference on Machine Learning (ICML-95) (pp. 313–321). San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Kramer, S. (1996). Structural regression trees. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96) (pp. 812–819). Menlo Park, CA: AAAI.

    Google Scholar 

  • Kreßel, U. H.-G. (1999). Pairwise classification and support vector machines. In B. Schölkopf, C. Burges, & A. Smola (Eds.), Advances in Kernel methods: Support vector learning (pp. 255–268). Cambridge, MA: MIT. Chap. 15.

    Google Scholar 

  • Landwehr, N., Kersting, K., & De Raedt, L. (2007). Integrating Naive Bayes and FOIL. Journal of Machine Learning Research, 8, 481–507.

    MATH  Google Scholar 

  • Langford, J., Oliveira, R., & Zadrozny, B. (2006). Predicting conditional quantiles via reduction to classification. In Proceedings of the 22nd Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), Cambridge, MA (pp. 257–264). Arlington, VA: AUAI.

    Google Scholar 

  • Lindgren, T., & Boström, H. (2004). Resolving rule conflicts with double induction. Intelligent Data Analysis, 8(5), 457–468.

    Google Scholar 

  • Loza Mencía, E., Park, S.-H., & Fürnkranz, J. (2009). Efficient voting prediction for pairwise multilabel classification. In Proceedings of the 17th European Symposium on Artificial Neural Networks (ESANN-09), Bruges, Belgium (pp. 117–122). Evere, Belgium: d-side publications.

    Google Scholar 

  • Lu, B.-L., & Ito, M. (1999). Task decomposition and module combination based on class relations: A modular neural network for pattern classification. IEEE Transactions on Neural Networks, 10(5), 1244–1256.

    Article  Google Scholar 

  • MacWilliams, F. J., & Sloane, N. J. A. (1983). The theory of error-correcting codes. North Holland, The Netherlands: North-Holland Mathematical Library.

    Google Scholar 

  • Melvin, I., Ie, E., Weston, J., Noble, W. S., & Leslie, C. (2007). Multi-class protein classification using adaptive codes. Journal of Machine Learning Research, 8, 1557–1581.

    MathSciNet  MATH  Google Scholar 

  • Michalski, R. S. (1969). On the quasi-minimal solution of the covering problem. In Proceedings of the 5th International Symposium on Information Processing (FCIP-69), Bled, Yugoslavia (Switching circuits, Vol. A3, pp. 125–128).

    Google Scholar 

  • Michalski, R. S. (1980). Pattern recognition and rule-guided inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 349–361.

    Article  MATH  Google Scholar 

  • Michalski, R. S., & Stepp, R. E. (1983). Learning from observation: Conceptual clustering. In R. Michalski, J. Carbonell, & T. Mitchell (Eds.), Machine learning: An artificial intelligence approach. Palo Alto, CA: Tioga.

    Google Scholar 

  • Mooney, R. J., & Califf, M. E. (1995). Induction of first-order decision lists: Results on learning the past tense of English verbs. Journal of Artificial Intelligence Research, 3, 1–24.

    Google Scholar 

  • Park, S.-H., & Fürnkranz, J. (2007). Efficient pairwise classification. In J. N. Kok, J. Koronacki, R. López de Mántaras, S. Matwin, D. Mladenić, & A. Skowron (Eds.), Proceedings of 18th European Conference on Machine Learning (ECML-07), Warsaw, Poland (pp. 658–665). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Park, S.-H., & Fürnkranz, J. (2009). Efficient decoding of ternary error-correcting output codes for multiclass classification. In W. L. Buntine, M. Grobelnik, D. Mladenić, & J. Shawe-Taylor (Eds.), Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD-09), Bled, Slovenia (Vol. Part II, pp. 189–204). Berlin, Germany: Springer.

    Google Scholar 

  • Pazzani, M., Merz, C. J., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the 11th International Conference on Machine Learning (ML-94) (pp. 217–225). New Brunswick, NJ: Morgan Kaufmann.

    Google Scholar 

  • Pelleg, D., & Moore, A. (2001). Mixtures of rectangles: Interpretable soft clustering. In C. E. Brodley & A. P. Danyluk (Eds.), Proceedings of the 18th International Conference on Machine Learning (ICML-01), Williamstown, MA (pp. 401–408). San Francisco: Morgan Kaufmann.

    Google Scholar 

  • Pietraszek, T. (2007). On the use of ROC analysis for the optimization of abstaining classifiers. Machine Learning, 68(2), 137–169.

    Article  Google Scholar 

  • Pimenta, E., Gama, J., & de Leon Ferreira de Carvalho, A. C. P. (2008). The dimension of ECOCs for multiclass classification problems. International Journal on Artificial Intelligence Tools, 17(3), 433–447.

    Google Scholar 

  • Platt, J. C., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems 12 (NIPS-99) (pp. 547–553). Cambridge, MA/London: MIT.

    Google Scholar 

  • Prati, R. C., & Flach, P. A. (2005). Roccer: An algorithm for rule learning based on ROC analysis. In L. P. Kaelbling & A. Saffiotti (Eds.), Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI-05), Edinburgh, UK (pp. 823–828). Professional Book Center.

    Google Scholar 

  • Price, D., Knerr, S., Personnaz, L., & Dreyfus, G. (1995). Pairwise neural network classifiers with probabilistic outputs. In G. Tesauro, D. Touretzky, & T. Leen (Eds.), Advances in neural information processing systems 7 (NIPS-94) (pp. 1109–1116). Cambridge, MA: MIT.

    Google Scholar 

  • Pujol, O., Radeva, P., & Vitriá, J. (2006). Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1007–1012.

    Article  Google Scholar 

  • Quevedo, J. R., Montañés, E., Luaces, O., & del Coz, J. J. (2010). Adapting decision DAGs for multipartite ranking. In J. L. Balcázar, F. Bonchi, A. Gionis, & M. Sebag (Eds.), Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD-10) Barcelona, Spain (Part III, pp. 115–130). Berlin, Germany/Heidelberg, Germany: Springer.

    Google Scholar 

  • Quinlan, J. R. (1987a). Generating production rules from decision trees. In Proceedings of the 10th International Joint Conference on Artificial Intelligence (IJCAI-87) (pp. 304–307). Los Altos, CA: Morgan Kaufmann.

    Google Scholar 

  • Quinlan, J. R. (1992). Learning with continuous classes. In N. Adams & L. Sterling (Eds.), Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Hobart, TAS (pp. 343–348). Singapore: World Scientific.

    Google Scholar 

  • Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.

    Google Scholar 

  • Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14, 465–471.

    Article  MATH  Google Scholar 

  • Salzberg, S. (1991). A nearest hyperrectangle learning method. Machine Learning, 6, 251–276.

    Google Scholar 

  • Schmidt, M. S., & Gish, H. (1996). Speaker identification via support vector classifiers. In Proceedings of the 21st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-96), Atlanta, GA (pp. 105–108). Piscataway, NJ: IEEE.

    Google Scholar 

  • Smola, A. J., & Schölkopf, B. (2004). A tutorial on support vector regression. Statistics and Computing, 14, 199–222.

    Article  MathSciNet  Google Scholar 

  • Stepp, R. E., & Michalski, R. S. (1986). Conceptual clustering of structured objects: A goal-oriented approach. Artificial Intelligence, 28(1), 43–69.

    Article  Google Scholar 

  • Sulzmann, J.-N., & Fürnkranz, J. (2011). Rule stacking: An approach for compressing an ensemble of rule sets into a single classifier. In T. Elomaa, J. Hollmèn, & H. Mannila (Eds.), Proceedings of the 14th International Conference on Discovery Science (DS-11), Espoo, Finland (pp. 323–334). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Torgo, L. (1995). Data fitting with rule-based regression. In J. Zizka & P. B. Brazdil (Eds.), Proceedings of the 2nd International Workshop on Artificial Intelligence Techniques (AIT-95). Brno, Czech Republic: Springer.

    Google Scholar 

  • Torgo, L., & Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 275-292.

    Article  Google Scholar 

  • Van Horn, K. S., & Martinez, T. R. (1993). The BBG rule induction algorithm. In Proceedings of the 6th Australian Joint Conference on Artificial Intelligence (AI-93), Melbourne, VIC (pp. 348–355). Singapore: World Scientific.

    Google Scholar 

  • Webb, G. I. (1994). Recent progress in learning decision lists by prepending inferred rules. In Proceedings of the 2nd Singapore International Conference on Intelligent Systems (pp. B280–B285). Singapore: World Scientific.

    Google Scholar 

  • Webb, G. I., & Brkič, N. (1993). Learning decision lists by prepending inferred rules. In Proceedings of the AI’93 Workshop on Machine Learning and Hybrid Systems, Melbourne, VIC (pp. 6–10). Melbourne, Australia.

    Google Scholar 

  • Weiss, S. M., & Indurkhya, N. (1995). Rule-based machine learning methods for functional prediction. Journal of Artificial Intelligence Research, 3, 383–403.

    MATH  Google Scholar 

  • Windeatt, T., & Ghaderi, R. (2003). Coding and decoding strategies for multi-class learning problems. Information Fusion, 4(1), 11–21.

    Article  Google Scholar 

  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–260.

    Article  MathSciNet  Google Scholar 

  • Wu, T.-F., Lin, C.-J., & Weng, R. C. (2004). Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5, 975–1005.

    MathSciNet  MATH  Google Scholar 

  • Zenko, B. (2007). Learning Predictive Clustering Rules. Ph.D. thesis, University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia.

    Google Scholar 

  • Zenko, B., Džeroski, S., & Struyf, J. (2006). Learning predictive clustering rules. In F. Bonchi & J.-F. Boulicaut (Eds.), Proceedings of the 4th International Workshop on Knowledge Discovery in Inductive Databases (KDID-05), Porto, Portugal (pp. 234–250). Berlin, Germany/New York: Springer.

    Google Scholar 

  • Zimmermann, A., & De Raedt, L. (2009). Cluster-grouping: From subgroup discovery to clustering. Machine Learning, 77(1), 125–159.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Fürnkranz, J., Gamberger, D., Lavrač, N. (2012). Beyond Concept Learning. In: Foundations of Rule Learning. Cognitive Technologies. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75197-7_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75197-7_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75196-0

  • Online ISBN: 978-3-540-75197-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics