Skip to main content

Frequent Itemset Mining over Data Streams

  • Chapter
  • First Online:
Book cover Data Stream Management

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

We study the problem of computing frequent elements in a data-stream. Given support threshold \(s \in [0, 1]\), an element is said to be frequent if it occurs more than \(sN\) times, where \(N\) denotes the current length of the stream. If we maintain a list of counters of the form 〈element, count〉, one counter per unique element encountered, we need \(N\) counters in the worst-case. Many distributions are heavy-tailed in practice, so we would need far fewer than \(N\) counters. However, the number would still exceed \(1/s\), which is the maximum possible number of frequent elements. If we insist on identifying exact frequency counts, then \(\varOmega(N)\) space is necessary. This observation motivates the design of streaming techniques based on \(\epsilon\) -approximate frequency counts. We also discuss the extension of the ideas to the problem of mining frequent itemsets over streams, and relevant applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in Proc. of 20th Intl. Conf. on Very Large Data Bases (1994), pp. 487–499

    Google Scholar 

  2. A. Arasu, G.S. Manku, Approximate counts and quantiles over sliding windows, in Proc. ACM Symposium on Principles of Database Systems (2004)

    Google Scholar 

  3. C. Estan, G. Varghese, New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst. 21(3), 270–313 (2003)

    Article  Google Scholar 

  4. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J. Ullman, Computing iceberg queries efficiently, in Proc. of 24th Intl. Conf. on Very Large Data Bases (1998), pp. 299–310

    Google Scholar 

  5. R.M. Karp, C.H. Papadimitriou, S. Shenker, A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)

    Article  Google Scholar 

  6. G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proc. 28th VLDB (2002), pp. 356–357

    Google Scholar 

  7. J. Misra, D. Gries, Finding repeated elements. Sci. Comput. Program. 2(2), 143–152 (1982)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gurmeet Singh Manku .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Manku, G.S. (2016). Frequent Itemset Mining over Data Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28608-0_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28607-3

  • Online ISBN: 978-3-540-28608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics