Approximate Frequent Itemset Mining In the Presence of Random Noise

Cheng, Hong; Yu, Philip S.; Han, Jiawei

doi:10.1007/978-0-387-69935-6_15

Approximate Frequent Itemset Mining In the Presence of Random Noise

Hong Cheng³,
Philip S. Yu⁴ &
Jiawei Han³

Chapter

1673 Accesses
16 Citations

Frequent itemset mining has been a focused theme in data mining research and an important first step in the analysis of data arising in a broad range of applications. The traditional exact model for frequent itemset requires that every item occur in each supporting transaction. However, real application data is usually subject to random noise or measurement error, which poses new challenges for the efficient discovery of frequent itemset from the noisy data. Mining approximate frequent itemset in the presence of noise involves two key issues: the definition of a noise-tolerant mining model and the design of an efficient mining algorithm. In this chapter, we will give an overview of the approximate itemset mining algorithms in the presence of random noise and examine several noise-tolerant mining approaches.

Key words: error-tolerant itemset, approximate frequent itemset, core pattern recovery

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964.
Google Scholar
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD’93, pages 207-216, May 1993.
Google Scholar
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’94, pages 487-499, Sept. 1994.
Google Scholar
R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of SIGMOD, pages 439-450, 2000.
Google Scholar
R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. SIGMOD’98, pages 85-93, June 1998.
Google Scholar
J.F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In Principles of Data Mining and Knowledge Discovery, pages 75-85, 2000.
Google Scholar
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. ICDE’01, pages 443-452, April 2001.
Google Scholar
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.
Google Scholar
H. Cheng, P. S. Yu, and J. Han AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. In Proc. of ICDM, pages 839-844, 2006.
Google Scholar
G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-k covering rule groups for gene expression data. In Proc. of SIGMOD, pages 670-681, 2005.
Google Scholar
FIMI: Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi, 2003.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’00, pages 1-12, May 2000.
Google Scholar
W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules.In Proc. of ICDM, pages 369-376, 2001.
Google Scholar
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of KDD, pages 80-86, 1998.
Google Scholar
J. Liu, S. Paulsen, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemset from noisy data. In Technical report, Department of Computer Science, TR05-015, 2005.
Google Scholar
J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM’06, pages 405-416, April 2006.
Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Knowledge Discovery and Data Mining, pages 189-194, 1996.
Google Scholar
J. Pei, G. Dong, W. Zou, and J. Han. Mining condensed frequent pattern bases. In Knowledge and Information Systems, volume 6 of 5, pages 570-594, 2004.
Google Scholar
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. DMKD’00, pages 11-20, May 2000.
Google Scholar
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge, 2nd edition, 1992.
Google Scholar
J. Seppänen and H. Mannila. Dense itemsets. In Proc. of KDD, pages 683-688, 2004.
Google Scholar
M. Steinbach, P. Tan, and V. Kumar. Support envelopes: A technique for exploring the structure of association patterns. In Proc. KDD’04, pages 296-305, Aug. 2004.
Google Scholar
UCI: machine learning repository. http://www.ics.uci.edu/˜mlearn/MLSummary.html, 2007.
V. Verykios, E. Bertino, I. Fovino, L. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 3:50-57, 2004.
Article Google Scholar
K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proc. of CIKM, pages 483-490, 1999.
Google Scholar
X. Yan, M. R. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. J. Zhou. A graph-based approach to systematically reconstruct human transcriptional regulatory modules. In Proc. of ISMB, 2007.
Google Scholar
X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335-346, 2004.
Google Scholar
C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD’01, pages 194-203, Aug. 2001.
Google Scholar
M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. Knowledge and Data Engineering, 12:372-390, 2000.
Article Google Scholar
M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. SDM’02, pages 457-473, April 2002.
Google Scholar
F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.
Google Scholar

Download references

Author information

Authors and Affiliations

University of Illinois at Urbana-Champaign, IL, USA
Hong Cheng & Jiawei Han
IBM T. J. Watson Research Center, USA
Philip S. Yu

Authors

Hong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tel Aviv University, 69978, Tel Aviv, Israel
Oded Maimon
Ben-Gurion University, 84105, Beer-Sheva, Israel
Lior Rokach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cheng, H., Yu, P.S., Han, J. (2008). Approximate Frequent Itemset Mining In the Presence of Random Noise. In: Maimon, O., Rokach, L. (eds) Soft Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69935-6_15

Download citation

DOI: https://doi.org/10.1007/978-0-387-69935-6_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69934-9
Online ISBN: 978-0-387-69935-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Buying options