An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations

Taniguchi, Tsuyoshi; Haraguchi, Makoto

doi:10.1007/11563983_20

Tsuyoshi Taniguchi²¹ &
Makoto Haraguchi²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

International Conference on Discovery Science

703 Accesses

Abstract

Given a transaction database as a global set of transactions and its local database obtained by some conditioning to the global one, we consider a pair of itemsets whose degrees of correlations are higher in the local database than in the global one. A problem of finding paired itemsets with high correlation in one database is known as Discovery of Correlation, and some algorithms to search for such characteristic paired itemsets are already proposed. However, even non-characteristic paired itemsets in the local database are also meaningful, provided the degree of correlation increases much higher in the local database than in the global one. They can be an implicit and hidden evidence showing that something particular to the local database occurs even though they are not yet realized as characteristic ones in the local. From this viewpoint, we have already proposed to measure the significance of paired itemsets by the difference of two correlations before and after the conditioning to the local database, and define a notion of DC pairs whose degrees of differences of correlations are high. As DC pairs are regarded as compound itemsets consisting of two component itemsets, we can have two basic strategies for finding them. One strategy firstly examines the compound itemsets and then the components, while another one does the component itemsets and then the compound ones. According to the former strategy, which we have already proposed and tested for its effectiveness, we have to enumerate many number of candidate compound itemsets that cannot be decomposable to components. For this reason, this paper presents a new algorithm according to the second strategy. It firstly enumerate possible component itemsets based on a new pruning rule for cutting off useless components. Secondly it forms the compound itemsets by combining the components thus detected, while we also make use of a constraint for preventing our algorithm from checking meaningless combinations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Taniguchi, T., Haraguchi, M., Okubo, Y.: Discovery of hidden correlations in a local transaction database based on differences of correlations. In: Proc. of the IAPR Int’l Conf. on Machine Learning and Data Mining in Pattern Recognition (2005) (to appear)
Google Scholar
Bayardo, R.J.: Efficiently mining long patterns from databases. In: Proc. of the ACM-SIGMOD Int’l Conf. on Management of Data, pp. 85–93 (1998)
Google Scholar
Uno, T., Kiyomi, M., Arimura, H.: LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In: Proc. of the IEEE Int’l Conf. on data mining, 2nd Workshop on Frequent Itemset Mining Implementations (FIMI 2004) (2004)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the Int’l Conf. on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar
Dong, G., Li, J.: Efficient mining of emerging patterns: discovering trends and differences. In: Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Google Scholar
Bay, S.D., Pazzani, M.J.: Detecting group differences: mining contrast sets. Data Mining and Knowledge Discovery 5(3), 213–246 (2001)
Article MATH Google Scholar
Webb, G.I., Butler, S., Newlands, D.: On detecting differences between groups. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 256–265 (2003)
Google Scholar
Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 265–276 (1997)
Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, vol. 26(2), pp. 255–264 (1997)
Google Scholar
Aggarwal, C.C., Yu, P.S.: A new framework for itemset generation. In: Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1998), pp. 18–24 (1998)
Google Scholar
Morishita, S., Sese, J.: Traversing itemset lattices with statistical metric pruning. In: Proc. of the ACM SIGACT-SIGMOD-SIGART Symposium on Database Systems (PODS), pp. 226–236 (2000)
Google Scholar
Ohsawa, Y., Nara, Y.: Understanding internet users on double helical model of chance-discovery process. In: Proc. of the IEEE Int’l Symposium on Intelligent Control, pp. 844–849 (2002)
Google Scholar
Hettich, S., Bay, S.D.: The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA (1999), http://kdd.ics.uci.edu

Download references

Author information

Authors and Affiliations

Division of Computer Science, Hokkaido University, N-14 W-9, Sapporo, 060-0814, Japan
Tsuyoshi Taniguchi & Makoto Haraguchi

Authors

Tsuyoshi Taniguchi
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Haraguchi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Hiroshi Motoda
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taniguchi, T., Haraguchi, M. (2005). An Algorithm for Mining Implicit Itemset Pairs Based on Differences of Correlations. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_20

Download citation

DOI: https://doi.org/10.1007/11563983_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics