Skip to main content

Approximate Frequent Itemset Mining In the Presence of Random Noise

  • Chapter

Frequent itemset mining has been a focused theme in data mining research and an important first step in the analysis of data arising in a broad range of applications. The traditional exact model for frequent itemset requires that every item occur in each supporting transaction. However, real application data is usually subject to random noise or measurement error, which poses new challenges for the efficient discovery of frequent itemset from the noisy data. Mining approximate frequent itemset in the presence of noise involves two key issues: the definition of a noise-tolerant mining model and the design of an efficient mining algorithm. In this chapter, we will give an overview of the approximate itemset mining algorithms in the presence of random noise and examine several noise-tolerant mining approaches.

Key words: error-tolerant itemset, approximate frequent itemset, core pattern recovery

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, 1964.

    Google Scholar 

  • R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. SIGMOD’93, pages 207-216, May 1993.

    Google Scholar 

  • R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. VLDB’94, pages 487-499, Sept. 1994.

    Google Scholar 

  • R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proc. of SIGMOD, pages 439-450, 2000.

    Google Scholar 

  • R. J. Bayardo. Efficiently mining long patterns from databases. In Proc. SIGMOD’98, pages 85-93, June 1998.

    Google Scholar 

  • J.F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In Principles of Data Mining and Knowledge Discovery, pages 75-85, 2000.

    Google Scholar 

  • D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. ICDE’01, pages 443-452, April 2001.

    Google Scholar 

  • H. Cheng, X. Yan, J. Han, and C. Hsu. Discriminative frequent pattern analysis for effective classification. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.

    Google Scholar 

  • H. Cheng, P. S. Yu, and J. Han AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery. In Proc. of ICDM, pages 839-844, 2006.

    Google Scholar 

  • G. Cong, K. Tan, A. Tung, and X. Xu. Mining top-k covering rule groups for gene expression data. In Proc. of SIGMOD, pages 670-681, 2005.

    Google Scholar 

  • FIMI: Frequent itemset mining implementations repository. http://fimi.cs.helsinki.fi, 2003.

  • J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. SIGMOD’00, pages 1-12, May 2000.

    Google Scholar 

  • W. Li, J. Han, and J. Pei. CMAR: Accurate and efficient classification based on multiple class-association rules.In Proc. of ICDM, pages 369-376, 2001.

    Google Scholar 

  • B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. of KDD, pages 80-86, 1998.

    Google Scholar 

  • J. Liu, S. Paulsen, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemset from noisy data. In Technical report, Department of Computer Science, TR05-015, 2005.

    Google Scholar 

  • J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM’06, pages 405-416, April 2006.

    Google Scholar 

  • H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In Knowledge Discovery and Data Mining, pages 189-194, 1996.

    Google Scholar 

  • J. Pei, G. Dong, W. Zou, and J. Han. Mining condensed frequent pattern bases. In Knowledge and Information Systems, volume 6 of 5, pages 570-594, 2004.

    Google Scholar 

  • J. Pei, J. Han, and R. Mao. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proc. DMKD’00, pages 11-20, May 2000.

    Google Scholar 

  • W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge, 2nd edition, 1992.

    Google Scholar 

  • J. Seppänen and H. Mannila. Dense itemsets. In Proc. of KDD, pages 683-688, 2004.

    Google Scholar 

  • M. Steinbach, P. Tan, and V. Kumar. Support envelopes: A technique for exploring the structure of association patterns. In Proc. KDD’04, pages 296-305, Aug. 2004.

    Google Scholar 

  • UCI: machine learning repository. http://www.ics.uci.edu/˜mlearn/MLSummary.html, 2007.

  • V. Verykios, E. Bertino, I. Fovino, L. Provenza, Y. Saygin, and Y. Theodoridis. State-of-the-art in privacy preserving data mining. SIGMOD Record, 3:50-57, 2004.

    Article  Google Scholar 

  • K. Wang, C. Xu, and B. Liu. Clustering transactions using large items. In Proc. of CIKM, pages 483-490, 1999.

    Google Scholar 

  • X. Yan, M. R. Mehan, Y. Huang, M. S. Waterman, P. S. Yu, and X. J. Zhou. A graph-based approach to systematically reconstruct human transcriptional regulatory modules. In Proc. of ISMB, 2007.

    Google Scholar 

  • X. Yan, P. S. Yu, and J. Han. Graph Indexing: A frequent structure-based approach. In Proc. of SIGMOD, pages 335-346, 2004.

    Google Scholar 

  • C. Yang, U. Fayyad, and P. S. Bradley. Efficient discovery of error-tolerant frequent itemsets in high dimensions. In Proc. KDD’01, pages 194-203, Aug. 2001.

    Google Scholar 

  • M. J. Zaki. Scalable algorithms for association mining. IEEE Trans. Knowledge and Data Engineering, 12:372-390, 2000.

    Article  Google Scholar 

  • M. J. Zaki and C. J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proc. SDM’02, pages 457-473, April 2002.

    Google Scholar 

  • F. Zhu, X. Yan, J. Han, P. S. Yu, and H. Cheng. Mining colossal frequent patterns by core pattern fusion. In Proc. 2007 Int. Conf. Data Engineering (ICDE’07), Istanbul, Turkey, April 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Cheng, H., Yu, P.S., Han, J. (2008). Approximate Frequent Itemset Mining In the Presence of Random Noise. In: Maimon, O., Rokach, L. (eds) Soft Computing for Knowledge Discovery and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-69935-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-69935-6_15

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-69934-9

  • Online ISBN: 978-0-387-69935-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics