Abstract
In the previous chapter we have discussed methods that find patterns of different shapes in data sets. All these methods needed measures of similarity in order to group similar objects. In this chapter we will discuss methods that address a very different setup: instead of finding structure in a data set, we are now focusing on methods that find explanations for an unknown dependency within the data. Such a search for a dependency usually focuses on a so-called target attribute, that is, we are particularly interested in why one specific attribute has a certain value. In case of the target attribute being a nominal variable, we are talking about a classification problem; in case of a numerical value we are referring to a regression problem. Examples for such problems would be understanding why a customer belongs to the category of people who cancel their account (e.g., classifying her into a yes/no category) or better understanding the risk factors of customers in general.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Rumors say that ID3 stands for “Iterative Dichotomiser 3” (from Greek dichotomia: divided), supposedly it was Quinlans’ third attempt. Another interpretation of one of the authors is “Induction of Decision 3rees.”
- 2.
Quinlan later also developed methods for regression problems, similar to CART.
- 3.
Since one or more of them may be metric, we may have to use a probability density function f to refer to descriptive attributes: f(x∣y). However, we ignore such notational subtleties here.
- 4.
This is a prior probability, because it describes the class probability before observing the values of any descriptive attributes.
- 5.
For more details see also Sect. 5.4.
- 6.
Note, however, that this second property can also be a disadvantage, as it can make outliers have an overly strong influence on the regression result.
- 7.
Note that we are not saying much about the truthfulness or precision of rules at this stage.
- 8.
Note that this is a substantial deviation from the abstract concepts of rule learners in Mitchell’s version space setup: real-world rule learners usually do not investigate all more general (or more specific) rules but only a subset of those chosen by the employed heuristic(s).
References
Albert, A.: Regression and the Moore–Penrose Pseudoinverse. Academic Press, New York (1972)
Anderson, E.: The irises of the Gaspe Peninsula. Bull. Am. Iris Soc. 59, 2–5 (1935)
Berthold, M.R.: Fuzzy logic. In: Berthold, M.R., Hand, D.J. (eds.) Intelligent Data Analysis: An Introduction, 2nd edn. Springer, Berlin (2003)
Borgelt, C., Steinbrecher, M., Kruse, R.: Graphical Models—Representations for Learning, Reasoning and Data Mining, 2nd edn. Wiley, Chichester (2009)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: CART: Classification and Regression Trees. Wadsworth, Belmont (1983)
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn. 29, 103–137 (1997)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Friedman, N., Goldszmidt, M.: Building classifiers using Bayesian networks. In: Proc. 13th Nat. Conf. on Artificial Intelligence (AAAI’96, Portland, OR, USA), pp. 1277–1284. AAAI Press, Menlo Park (1996)
Geiger, D.: An entropy-based learning algorithm of Bayesian conditional trees. In: Proc. 8th Conf. on Uncertainty in Artificial Intelligence (UAI’92, Stanford, CA, USA), pp. 92–97. Morgan Kaufmann, San Mateo (1992)
Goodman, R.M., Smyth, P.: An information-theoretic model for rule-based expert systems. In: Int. Symposium in Information Theory. Kobe, Japan (1988)
Janikow, C.Z.: Fuzzy decision trees: issues and methods. IEEE Trans. Syst. Man, Cybern., Part B 28(1), 1–14 (1998)
Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer, London (2007)
Larrañaga, P., Poza, M., Yurramendi, Y., Murga, R., Kuijpers, C.: Structural learning of Bayesian networks by genetic algorithms: a performance analysis of control parameters. IEEE Trans. Pattern Anal. Mach. Intell. 18, 912–926 (1996)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Nauck, D., Klawonn, F., Kruse, R.: Neuro-Fuzzy Systems. Wiley, Chichester (1997)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Quinlan, J.R., Cameron-Jones, R.M.: FOIL: a midterm report. In: Proc. European Conference on Machine Learning. Lecture Notes in Computer Science, vol. 667, pp. 3–20. Springer, Berlin (1993)
Sahami, M.: Learning limited dependence Bayesian classifiers. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD’96, Portland, OR, USA), pp. 335–338. AAAI Press, Menlo Park (1996)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. (2010). Finding Explanations. In: Guide to Intelligent Data Analysis. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84882-260-3_8
Download citation
DOI: https://doi.org/10.1007/978-1-84882-260-3_8
Publisher Name: Springer, London
Print ISBN: 978-1-84882-259-7
Online ISBN: 978-1-84882-260-3
eBook Packages: Computer ScienceComputer Science (R0)