Abstract
Subgroup discovery aims at finding interesting subsets of a classified example set that deviates from the overall distribution. The search is guided by a so-called utility function, trading the size of subsets (coverage) against their statistical unusualness. By choosing the utility function accordingly, subgroup discovery is well suited to find interesting rules with much smaller coverage and bias than possible with standard classifier induction algorithms. Smaller subsets can be considered local patterns, but this work uses yet another definition: According to this definition global patterns consist of all patterns reflecting the prior knowledge available to a learner, including all previously found patterns. All further unexpected regularities in the data are referred to as local patterns. To address local pattern mining in this scenario, an extension of subgroup discovery by the knowledge-based sampling approach to iterative model refinement is presented. It is a general, cheap way of incorporating prior probabilistic knowledge in arbitrary form into Data Mining algorithms addressing supervised learning tasks.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Brin, S., Motwani, R., Ullman, J., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Data. In: Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD 1997), Tucson, AZ, pp. 255–264. ACM, New York (1997)
Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Researchers. Submitted to Machine Learning (2004)
Freund, Y., Schapire, R.R.: A decision–theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences 55(1), 119–139 (1997)
Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting. Annals of Statistics (28), 337–374 (2000)
Fürnkranz, J., Flach, P.A.: An Analysis of Rule Evaluation Metrics. In: Proceedings of the 20th International Conference on Machine Learning (ICML 2003). Morgan Kaufman, San Francisco (2003)
Hand, D.: Pattern detection and discovery. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, p. 1. Springer, Heidelberg (2002)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Francisco (1995)
Klösgen, W.: Explora: A Multipattern and Multistrategy Discovery Assistant. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, ch. 3, pp. 249–272. AAAI Press/The MIT Press, Menlo Park (1996)
Lavrac, N., Zelezny, F., Flach, P.: RSD: Relational subgroup discovery through first-order feature construction. In: Matwin, S., Sammut, C. (eds.) ILP 2002. LNCS (LNAI), vol. 2583, pp. 149–165. Springer, Heidelberg (2003)
Lavrac, N., Flach, P., Kavsek, B., Todorovski, L.: Rule Induction for Subgroup Discovery with CN2-SD. In: Bohanec, M., Mladenic, D., Lavrac, N. (eds.) 2nd Int. Workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta Learning (August 2002)
Lavrac, N., Flach, P., Zupan, B.: Rule Evaluation Measures: A Unifying View. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, p. 174. Springer, Heidelberg (1999)
Mackay, D.J.C.: Introduction To Monte Carlo Methods. In: Learning in Graphical Models, pp. 175–204 (1998)
Mierswa, I., Klinkberg, R., Fischer, S., Ritthoff, O.: A Flexible Platform for Knowledge Discovery Experiments: YALE – Yet Another Learning Environment. In: LLWA 2003 - Tagungsband der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivität (2003)
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Schapire, R.E.: The Strength of Weak Learnability. Machine Learning 5, 197–227 (1990)
Schapire, R.E., Singer, Y.: Improved boosting using confidence-rated predictions. Machine Learning 37(3), 297–336 (1999)
Scheffer, T., Wrobel, S.: A Sequential Sampling Algorithm for a General Class of Utility Criteria. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (2000)
Scheffer, T., Wrobel, S.: Finding the Most Interesting Patterns in a Database Quickly by Using Sequential Sampling. Journal of Machine Learning Research 3, 833–862 (2002)
Silberschatz, A., Tuzhilin, A.: What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering 8(6), 970–974 (December 1996)
Suzuki, E.: Discovering Interesting Exception Rules with Rule Pair. In: ECML/PKDD 2004 Workshop, Advances in Inductive Rule Learning (2004)
Witten, I., Frank, E.: Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Wrobel, S.: An Algorithm for Multi–relational Discovery of Subgroups. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 78–87. Springer, Heidelberg (1997)
Zadrozny, B., Langford, J., Naoki, A.: Cost–Sensitive Learning by Cost–Proportionate Example Weighting. In: Proceedings of the 2003 IEEE International Conference on Data Mining, ICDM 2003 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scholz, M. (2005). Knowledge-Based Sampling for Subgroup Discovery. In: Morik, K., Boulicaut, JF., Siebes, A. (eds) Local Pattern Detection. Lecture Notes in Computer Science(), vol 3539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11504245_11
Download citation
DOI: https://doi.org/10.1007/11504245_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26543-6
Online ISBN: 978-3-540-31894-1
eBook Packages: Computer ScienceComputer Science (R0)