Efficient redundancy reduced subgroup discovery via quadratic programming
- 203 Downloads
Subgroup discovery is a task at the intersection of predictive and descriptive induction, aiming at identifying subgroups that have the most unusual statistical (distributional) characteristics with respect to a property of interest. Although a great deal of work has been devoted to the topic, one remaining problem concerns the redundancy of subgroup descriptions, which often effectively convey very similar information. In this paper, we propose a quadratic programming based approach to reduce the amount of redundancy in the subgroup rules. Experimental results on 12 datasets show that the resulting subgroups are in fact less redundant compared to standard methods. In addition, our experiments show that the computational costs are significantly lower than the costs of other methods compared in the paper.
KeywordsSubgroup discovery Mutual information Quadratic programming Rule learning Redundancy
The first author acknowledges the support of the TUM Graduate School of Information Science in Health (GSISH), Technische Universität München.
- Bache, K., & Lichman, M. (2013). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml.
- Bouckaert, R., & Frank, E. (2004). Evaluating the replicability of significance tests for comparing learning algorithms. In The 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) (pp. 3–12). Springer.Google Scholar
- Cohen, W., & Singer, Y. (1999). A simple, fast, and effective rule learner. In Proceedings of the sixteenth national conference on artificial intelligence (pp. 335–342). AAAI Press.Google Scholar
- Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’99.Google Scholar
- Grosskreutz, H., & Paurat, D. (2011). Fast and memory-efficient discovery of the top-k relevant subgroups in a reduced candidate space. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 533–548). Springer-Verlag.Google Scholar
- Grosskreutz, H., Rüping, S., Wrobel, S. (2008). Tight optimistic estimates for fast subgroup discovery. In Proceedings of the 18th European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 440–456). Springer-Verlag.Google Scholar
- Klösgen, W. (1996). Explora: A multipattern and multistrategy discovery assistant. In Advances in knowledge discovery and data mining.Google Scholar
- Klösgen, W., & May, M. (2002). Census data mining-an application. In Mining official data (pp. 65–79).Google Scholar
- Lavrac, N., Kavsek, B., Flach, P., Todorovski, L., Wrobel, S. (2004). Subgroup discovery with CN2-SD. Journal of Machine Learning Research, 5, 153–118.Google Scholar
- Martin, A., & Frank, P. (2006). SD-Map – a fast algorithm for exhaustive subgroup discovery. In Proceeding of 10th European conferences on principles and practice of knowledge discovery in databases (pp. 6–17).Google Scholar
- Morishita, S., & Sese, J. (2000). Traversing itemset lattices with statistical metric pruning. In Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 226–236). ACM.Google Scholar
- Rodrigue-Lujan, I., Huerta, R., Elkan, C., Cruz, C. (2010). Quadratic programming feature selection. Journal of Machine Learning Research, 11, 1491–1516.Google Scholar
- Rüping, S. (2009). Ranking interesting subgroups. In Proceedings of the 26th annual International Conference on Machine Learning, ICML ’09 (pp. 913–920). New York: ACM.Google Scholar
- van Leeuwen, M.V., & Knobbe, A. (2011). Non-redundant subgroup discovery in large and complex data. In Proceedings of the 21st European conference on machine learning and principles and practice of knowledge discovery in databases (pp. 459–474). Springer-Verlag.Google Scholar
- Wrobel, S. (1997). An algorithm for multi-relational discovery of subgroups. In Proceedings of the first European symposium on principles of data mining and knowledge discovery.Google Scholar
- Xin, D., Cheng, H., Yan, X., Han, J. (2006). Extracting redundancy-aware top-k patterns. In Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 444–453).Google Scholar
- Zimmermann, A., & Raedt, L. (2004). CorClass: Correlated association rule mining for classification. In Proceedings of discovery science (pp. 60–72). Springer.Google Scholar