Skip to main content

Approximate Frequent Itemset Discovery from Data Stream

  • Conference paper
  • 795 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5883))

Abstract

Traditional algorithms for frequent itemset discovery are designed for static data. They cannot be straightforwardly applied to data streams which are continuous, unbounded, usually coming at high speed and often with a data distribution which changes with time. The main challenges of frequent pattern mining in data streams are: avoiding multiple scans of the entire dataset, optimizing memory usage and capturing distribution drift. To face these challenges, we propose a novel algorithm, which is based on a sliding window model in order to deal with efficiency issues and to keep up with distribution change. Each window consists of several slides. The generation of itemsets is local to each slide, while the estimation of their approximate support is based on the window. Efficiency in the generation of the itemsets is ensured by the usage of a synopsis structure, called SE-tree. Experiments prove the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chang, J.H., Lee, W.S.: Finding recent frequent itemsets adaptively over online data streams. In: KDD 2003, pp. 487–492. ACM Press, New York (2003)

    Chapter  Google Scholar 

  2. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: Perner, P. (ed.) ICDM 2004. LNCS (LNAI), vol. 3275, pp. 59–66. Springer, Heidelberg (2004)

    Google Scholar 

  3. Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Rec 34(2), 18–26 (2005)

    Article  Google Scholar 

  4. Ganti, V., Gehrke, J., Ramakrishnan, R.: Mining data streams under block evolution. SIGKDD Explorations 3(2), 1–10 (2002)

    Article  Google Scholar 

  5. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.: Mining frequent patterns in data streams at multiple time granularities, technical report, computer science department, indiana university (2002)

    Google Scholar 

  6. Golab, L., Dehaan, D., Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Identifying frequent items in sliding windows over on-line packet streams. In: Proceedings of the Internet Measurement Conference, pp. 173–178. ACM Press, New York (2003)

    Chapter  Google Scholar 

  7. Lin, C., Chiu, D., Wu, Y.: Mining frequent itemsets from data streams with a time-sensitive sliding window. In: SDM 2005 (2005)

    Google Scholar 

  8. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: VLDB 2002, pp. 346–357 (2002)

    Google Scholar 

  9. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  10. Mozafari, B., Thakkar, H., Zaniolo, C.: Verifying and mining frequent patterns from large windows over data streams. In: DE 2008, pp. 179–188 (2008)

    Google Scholar 

  11. Ren, J., Li, K.: Find recent frequent items with sliding windows in data streams. In: IIH-MSP 2007, pp. 625–628. IEEE Computer Society Press, Los Alamitos (2007)

    Google Scholar 

  12. Rymon, R.: An se-tree based characterization of the induction problem. In: ICML 1993, pp. 268–275. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  13. Silvestri, C., Orlando, S.: Approximate mining of frequent patterns on streams. Intell. Data Anal. 11(1), 49–73 (2007)

    Google Scholar 

  14. Yu, J.X., Chong, Z., Lu, H., Zhou, A.: False positive or false negative: mining frequent itemsets from high speed transactional data streams. In: VLDB 2004, VLDB Endowment, pp. 204–215 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ciampi, A., Fumarola, F., Appice, A., Malerba, D. (2009). Approximate Frequent Itemset Discovery from Data Stream. In: Serra, R., Cucchiara, R. (eds) AI*IA 2009: Emergent Perspectives in Artificial Intelligence. AI*IA 2009. Lecture Notes in Computer Science(), vol 5883. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10291-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10291-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10290-5

  • Online ISBN: 978-3-642-10291-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics