Abstract
We propose DepMiner, a method implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on Δ, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy.
DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since Δ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by Δ and presents the histogram of Δ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of Δ are automatically determined by the system. The system uses the thresholds for Δ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. PODS (1998)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science (2006)
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proc. SIGMOD (1997)
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1) (2007)
Duan, L., Street, W.N.: Finding maximal fully-correlated itemsets in large databases. In: IEEE International Conference on Data Mining, pp. 770–775 (2009)
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Gallo, A., De Bie, T., Cristianini, N.: MINI: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. In: Proc. KDD (2006)
Goodman, K.: Measures of association for cross classifications. J. Amer. Stat. Ass. 49(268) (1954)
Hilderman, R.J., Hamilton, H.J.: Measuring the interestingness of discovered knowledge: A principled approach. Intell. Data Anal. 7, 347–382 (2003)
Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD, pp. 237–244 (2006)
Liu, Z.Z.H.: Searching for interacting features. In: The 20th International Joint Conference on AI, IJCAI 2007 (2007)
Meo, R.: Theory of dependence values. TODS 45(3) (2000)
Meo, R.: Maximum independence and mutual information. TOIT 48(1) (January 2002)
Meo, R., Ienco, D.: Replacing support in association rule mining. In: Sing, Y., Rountree, N. (eds.) Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection. IGI Global publisher (2008)
Omiecinski, E.: Alternative interest measures for mining associations in databases. TKDE 15(1) (2003)
Savinov, A.: Mining dependence rules by finding largest support quota. In: Proc. SAC (2004)
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SDM (2006)
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. KDD (2002)
Tatti, N.: Maximum entropy based significance of itemsets. In: Proc. ICDM (2007)
Uno, T., Asai, T., Uchida, Y., Arimura, H.: Lcm v2. In: FIMI 2004 (2004)
Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 434–443 (2006)
Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: KDD (2006)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB. pp. 709–720 (2005)
Zhang, X., Pan, F., Wang, W., Nobel, A.B.: Mining non-redundant high order correlations in binary data. PVLDB 1(1) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Meo, R., D’Ambrosi, L. (2012). DepMiner: A Method and a System for the Extraction of Significant Dependencies. In: Holmes, D.E., Jain, L.C. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23166-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23166-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23165-0
Online ISBN: 978-3-642-23166-7
eBook Packages: EngineeringEngineering (R0)