Skip to main content

DepMiner: A Method and a System for the Extraction of Significant Dependencies

  • Chapter
  • 1934 Accesses

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 23))

Abstract

We propose DepMiner, a method implementing a simple but effective model for the evaluation of itemsets, and in general for the evaluation of the dependencies between the values assumed by a set of variables on a domain of finite values. This method is based on Δ, the departure of the probability of an observed event from a referential probability of the same event. The observed probability is the probability that the variables assume in the database given values; the referential probability, is the probability of the same event estimated in the condition of maximum entropy.

DepMiner is able to distinguish between dependencies among the variables intrinsic to the itemset and dependencies “inherited” from the subsets: thus it is suitable to evaluate the utility of an itemset w.r.t. its subsets. The method is powerful: at the same time it detects significant positive dependencies as well as negative ones suitable to identify rare itemsets. Since Δ is anti-monotonic it can be embedded efficiently in algorithms. The system returns itemsets ranked by Δ and presents the histogram of Δ distribution. Parameters that govern the method, such as minimum support for itemsets and thresholds of Δ are automatically determined by the system. The system uses the thresholds for Δ to identify the statistically significant itemsets. Thus it succeeds to reduce the volume of results more then competitive methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. PODS (1998)

    Google Scholar 

  2. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science (2006)

    Google Scholar 

  3. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: Generalizing association rules to correlations. In: Proc. SIGMOD (1997)

    Google Scholar 

  4. Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1) (2007)

    Google Scholar 

  5. Duan, L., Street, W.N.: Finding maximal fully-correlated itemsets in large databases. In: IEEE International Conference on Data Mining, pp. 770–775 (2009)

    Google Scholar 

  6. Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)

    MathSciNet  MATH  Google Scholar 

  7. Gallo, A., De Bie, T., Cristianini, N.: MINI: Mining informative non-redundant itemsets. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 438–445. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  8. Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. In: Proc. KDD (2006)

    Google Scholar 

  9. Goodman, K.: Measures of association for cross classifications. J. Amer. Stat. Ass. 49(268) (1954)

    Google Scholar 

  10. Hilderman, R.J., Hamilton, H.J.: Measuring the interestingness of discovered knowledge: A principled approach. Intell. Data Anal. 7, 347–382 (2003)

    MATH  Google Scholar 

  11. Knobbe, A.J., Ho, E.K.Y.: Maximally informative k-itemsets and their efficient discovery. In: KDD, pp. 237–244 (2006)

    Google Scholar 

  12. Liu, Z.Z.H.: Searching for interacting features. In: The 20th International Joint Conference on AI, IJCAI 2007 (2007)

    Google Scholar 

  13. Meo, R.: Theory of dependence values. TODS 45(3) (2000)

    Google Scholar 

  14. Meo, R.: Maximum independence and mutual information. TOIT 48(1) (January 2002)

    Google Scholar 

  15. Meo, R., Ienco, D.: Replacing support in association rule mining. In: Sing, Y., Rountree, N. (eds.) Rare Association Rule Mining and Knowledge Discovery: Technologies for Infrequent and Critical Event Detection. IGI Global publisher (2008)

    Google Scholar 

  16. Omiecinski, E.: Alternative interest measures for mining associations in databases. TKDE 15(1) (2003)

    Google Scholar 

  17. Savinov, A.: Mining dependence rules by finding largest support quota. In: Proc. SAC (2004)

    Google Scholar 

  18. Siebes, A., Vreeken, J., van Leeuwen, M.: Item sets that compress. In: SDM (2006)

    Google Scholar 

  19. Tan, P.-N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proc. KDD (2002)

    Google Scholar 

  20. Tatti, N.: Maximum entropy based significance of itemsets. In: Proc. ICDM (2007)

    Google Scholar 

  21. Uno, T., Asai, T., Uchida, Y., Arimura, H.: Lcm v2. In: FIMI 2004 (2004)

    Google Scholar 

  22. Webb, G.I.: Discovering significant rules. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 434–443 (2006)

    Google Scholar 

  23. Xin, D., Cheng, H., Yan, X., Han, J.: Extracting redundancy-aware top-k patterns. In: KDD (2006)

    Google Scholar 

  24. Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB. pp. 709–720 (2005)

    Google Scholar 

  25. Zhang, X., Pan, F., Wang, W., Nobel, A.B.: Mining non-redundant high order correlations in binary data. PVLDB 1(1) (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Meo, R., D’Ambrosi, L. (2012). DepMiner: A Method and a System for the Extraction of Significant Dependencies. In: Holmes, D.E., Jain, L.C. (eds) Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol 23. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23166-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23166-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23165-0

  • Online ISBN: 978-3-642-23166-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics