Skip to main content

An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

  • Conference paper
Book cover Discovery Science (DS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

  • 703 Accesses

Abstract

Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this paper presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound itemsets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Taniguchi, T., Haraguchi, M., Okubo, Y.: Discovery of hidden correlations in a local transaction database based on differences of correlations. In: Proc. of the IAPR Int’l Conf. on Machine Learning and Data Mining in Pattern Recognition (2005) (to appear)

    Google Scholar 

  2. Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. of the ACM-SIGMOD Int’l Conf. on Management of Data, pp. 85–93 (1998)

    Google Scholar 

  3. Uno, T., Kiyomi, M., Arimura, H.: LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. of the IEEE Int’l Conf. on data mining, 2nd Workshop on Frequent Itemset Mining Implementations (FIMI 2004) (2004)

    Google Scholar 

  4. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the Int’l Conf. on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  5. Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 43–52 (1999)

    Google Scholar 

  6. Bay, S.D., Pazzani, M.J.: Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)

    Article  MATH  Google Scholar 

  7. Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 256–265 (2003)

    Google Scholar 

  8. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 265–276 (1997)

    Google Scholar 

  9. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 255–264 (1997)

    Google Scholar 

  10. Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), pp. 18–24 (1998)

    Google Scholar 

  11. Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pp. 226–236 (2000)

    Google Scholar 

  12. Ohsawa, Y., Nara, Y.: Understanding internet users on double helical model of chance-discovery process. In: Proc. of the IEEE Int’l Symposium on Intelligent Control, pp. 844–849 (2002)

    Google Scholar 

  13. Hettich, S., Bay, S.D.: The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA (1999), http://kdd.ics.uci.edu

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Taniguchi, T., Haraguchi, M. (2005). An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_20

Download citation

  • DOI: https://doi.org/10.1007/11563983_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29230-2

  • Online ISBN: 978-3-540-31698-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics