DepMiner: A Method and a System for the Extraction of Significant Dependencies

Meo, Rosa; D’Ambrosi, Leonardo

doi:10.1007/978-3-642-23166-7_8

DepMiner: A Method and a System for the Extraction of Significant Dependencies

Rosa Meo⁵ &
Leonardo D’Ambrosi⁶

Chapter

1934 Accesses

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 23))

Abstract

We propose DepMiner, a method implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on Δ, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy.

DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since Δ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by Δ and presents the histogram of Δ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of Δ are automatically determined by the system. The system uses the thresholds for Δ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. PODS (1998)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science (2006)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proc. SIGMOD (1997)
Google Scholar
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1) (2007)
Google Scholar
Duan, L., Street, W.N.: Finding maximal fully-correlated itemsets in large databases. In: IEEE International Conference on Data Mining, pp. 770–775 (2009)
Google Scholar
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
MathSciNet MATH Google Scholar
Gallo, A., De Bie, T., Cristianini, N.: MINI: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)
Chapter Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. In: Proc. KDD (2006)
Google Scholar
Goodman, K.: Measures of association for cross classifications. J. Amer. Stat. Ass. 49(268) (1954)
Google Scholar
Hilderman, R.J., Hamilton, H.J.: Measuring the interestingness of discovered knowledge: A principled approach. Intell. Data Anal. 7, 347–382 (2003)
MATH Google Scholar
Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD, pp. 237–244 (2006)
Google Scholar
Liu, Z.Z.H.: Searching for interacting features. In: The 20th International Joint Conference on AI, IJCAI 2007 (2007)
Google Scholar
Meo, R.: Theory of dependence values. TODS 45(3) (2000)
Google Scholar
Meo, R.: Maximum independence and mutual information. TOIT 48(1) (January 2002)
Google Scholar
Meo, R., Ienco, D.: Replacing support in association rule mining. In: Sing, Y., Rountree, N. (eds.) Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection. IGI Global publisher (2008)
Google Scholar
Omiecinski, E.: Alternative interest measures for mining associations in databases. TKDE 15(1) (2003)
Google Scholar
Savinov, A.: Mining dependence rules by finding largest support quota. In: Proc. SAC (2004)
Google Scholar
Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SDM (2006)
Google Scholar
Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. KDD (2002)
Google Scholar
Tatti, N.: Maximum entropy based significance of itemsets. In: Proc. ICDM (2007)
Google Scholar
Uno, T., Asai, T., Uchida, Y., Arimura, H.: Lcm v2. In: FIMI 2004 (2004)
Google Scholar
Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 434–443 (2006)
Google Scholar
Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: KDD (2006)
Google Scholar
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB. pp. 709–720 (2005)
Google Scholar
Zhang, X., Pan, F., Wang, W., Nobel, A.B.: Mining non-redundant high order correlations in binary data. PVLDB 1(1) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Torino, Italy
Rosa Meo
Regional Agency for Health Care Services - A.Re.S.S. Piemonte, Italy
Leonardo D’Ambrosi

Authors

Rosa Meo
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo D’Ambrosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Statistics and Applied Probability , University of California, 93106, Santa Barbara, CA, USA
Dawn E. Holmes
Knowledge-Based Engineering , University of South Australia, 5095, Adelaide, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Meo, R., D’Ambrosi, L. (2012). DepMiner: A Method and a System for the Extraction of Significant Dependencies. In: Holmes, D.E., Jain, L.C. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23166-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-23166-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23165-0
Online ISBN: 978-3-642-23166-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics