Skip to main content

An Efficient Algorithm for Frequent Itemset Mining on Data Streams

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4065))

Abstract

In order to mining frequent itemsets on data stream efficiently, a new approach was proposed in this paper. The memory efficient and accurate one-pass algorithm divides all the frequent itemsets into frequent equivalence classes and prune all the redundant itemsets except for those represent the GLB(Greatest Lower Bound) and LUB(Least Upper Bound) of the frequent equivalence class and the number of GLB and LUB is much less than that of frequent itemsets. In order to maintain these equivalence classes, A compact data structure, the frequent itemset enumerate tree (FIET) was proposed in the paper. The detailed experimental evaluation on synthetic and real datasets shows that the algorithm is very accurate in practice and requires significantly lower memory than Jin and Agrawal’s one pass algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonent, H., Verkamo, A.I.: Fast discovery of associantion rules. In: Fayyad, U., et al. (eds.) Advances in knowledge Discovery and Data Mining, pp. 307–328. AAAI press, Menlo Park (1996)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc.1994 Int. conf. Very Large DataBases (VLDB 1994), Santiago, Chile, September 1994, pp. 487–499 (1994)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream systems. In: Proceedings of the 2002 ACM Symposium on principles of Database Systems (PODS 2002). ACM Press, New York (2002)

    Google Scholar 

  4. Borgelt, C.: Apriori implementation, http://fuzzy.cs.UniMagdeburg.de/borgelt/Software

  5. Dobra, A., Gehrke, J., Garofalakis, M., Rastogi, R.: Processing complex aggregate queries over data streams. In: proc.of the 2002 ACM SIGMOD Intl. Conf. on Management of Data (June 2002)

    Google Scholar 

  6. Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the ACM Conference on Knowledge and Data Discovery (SIGKDD) (2000)

    Google Scholar 

  7. Gehrke, J., Korn, F., Srivastava, D.: On computing correlated aggregates over continual data streams. In: Proc.of the 2001 ACM SIGMOD Intl. Conf. on Manaagement of Data, pp. 13–24. ACM Press, New York (2001)

    Chapter  Google Scholar 

  8. Giannella, C., Han, J., Pei, J., yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proceedings of the NSF Workshop on Next Generation Data Mining (November 2002)

    Google Scholar 

  9. Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proc.of the 2001 ACM Symp. on parallel Algorithms and Architechtures, pp. 281–291. ACM Press, New York (2001)

    Chapter  Google Scholar 

  10. Goethals, B.: Fp-tree implementation, http://www.cs.helsinki.fi/u/goethals/software/index/html

  11. Jin, R., Agrawal, G.: An algorithm for in-core frequent itemset mining on streaming data (submitted, 2004)

    Google Scholar 

  12. Zaki, M.J., Hsiao, C.: Charm: An efficient algorithm for closed itemset mining. In: 2nd SIAM Int’l. conf. on Data Mining (2002)

    Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining Frequent patterns without candidate generation. In: Proceedings of the ACM SIGMOD Conference on Management of Data (2000)

    Google Scholar 

  14. Li, C., Cong, G., Tung, A.K.H., Wang, S.: Incremental Maintainence of Quotient Cube for sum and Median. In: Proceedings of SIGKDD, Seattle, WA, USA, pp. 226–235 (August 2004)

    Google Scholar 

  15. Manku, G.S., Motwain, R.: Approximate Frequency Counts Over Data Streams. In: Proceedings of Conference on Very Large DataBase (VLDB), pp. 346–357 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhi-jun, X., Hong, C., Li, C. (2006). An Efficient Algorithm for Frequent Itemset Mining on Data Streams. In: Perner, P. (eds) Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. ICDM 2006. Lecture Notes in Computer Science(), vol 4065. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11790853_37

Download citation

  • DOI: https://doi.org/10.1007/11790853_37

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-36036-0

  • Online ISBN: 978-3-540-36037-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics