Frequent itemset mining has been a focused theme in data mining research and an important first step in the analysis of data arising in a broad range of applications. The traditional exact model for frequent itemset requires that every item occur in each supporting transaction. However, real application data is usually subject to random noise or measurement error, which poses new challenges for the efficient discovery of frequent itemset from the noisy data. Mining approximate frequent itemset in the presence of noise involves two key issues: the definition of a noise-tolerant mining model and the design of an efficient mining algorithm. In this chapter, we will give an overview of the approximate itemset mining algorithms in the presence of random noise and examine several noise-tolerant mining approaches.
Key words: error-tolerant itemset, approximate frequent itemset, core pattern recovery
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964.
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD’93, pages 207-216, May 1993.
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’94, pages 487-499, Sept. 1994.
R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of SIGMOD, pages 439-450, 2000.
R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. SIGMOD’98, pages 85-93, June 1998.
J.F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In Principles of Data Mining and Knowledge Discovery, pages 75-85, 2000.
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. ICDE’01, pages 443-452, April 2001.
H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.
H. Cheng, P. S. Yu, and J. Han AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. In Proc. of ICDM, pages 839-844, 2006.
G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-k covering rule groups for gene expression data. In Proc. of SIGMOD, pages 670-681, 2005.
FIMI: Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi, 2003.
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’00, pages 1-12, May 2000.
W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules.In Proc. of ICDM, pages 369-376, 2001.
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of KDD, pages 80-86, 1998.
J. Liu, S. Paulsen, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemset from noisy data. In Technical report, Department of Computer Science, TR05-015, 2005.
J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM’06, pages 405-416, April 2006.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Knowledge Discovery and Data Mining, pages 189-194, 1996.
J. Pei, G. Dong, W. Zou, and J. Han. Mining condensed frequent pattern bases. In Knowledge and Information Systems, volume 6 of 5, pages 570-594, 2004.
J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. DMKD’00, pages 11-20, May 2000.
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge, 2nd edition, 1992.
J. Seppänen and H. Mannila. Dense itemsets. In Proc. of KDD, pages 683-688, 2004.
M. Steinbach, P. Tan, and V. Kumar. Support envelopes: A technique for exploring the structure of association patterns. In Proc. KDD’04, pages 296-305, Aug. 2004.
UCI: machine learning repository. http://www.ics.uci.edu/˜mlearn/MLSummary.html, 2007.
V. Verykios, E. Bertino, I. Fovino, L. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 3:50-57, 2004.
K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proc. of CIKM, pages 483-490, 1999.
X. Yan, M. R. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. J. Zhou. A graph-based approach to systematically reconstruct human transcriptional regulatory modules. In Proc. of ISMB, 2007.
X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335-346, 2004.
C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD’01, pages 194-203, Aug. 2001.
M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. Knowledge and Data Engineering, 12:372-390, 2000.
M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. SDM’02, pages 457-473, April 2002.
F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Cheng, H., Yu, P.S., Han, J. (2008). Approximate Frequent Itemset Mining In the Presence of Random Noise. In: Maimon, O., Rokach, L. (eds) Soft Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69935-6_15
Download citation
DOI: https://doi.org/10.1007/978-0-387-69935-6_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-69934-9
Online ISBN: 978-0-387-69935-6
eBook Packages: Computer ScienceComputer Science (R0)