Skip to main content

An Improved Data Stream Summary: The Count-Min Sketch and Its Applications

  • Conference paper
LATIN 2004: Theoretical Informatics (LATIN 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2976))

Included in the following conference series:

Abstract

We introduce a new sublinear space data structure—the Count-Min Sketch— for summarizing data streams. Our sketch allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition, it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc. The time and space bounds we show for using the CM sketch to solve these problems significantly improve those previously known — typically from 1/ε 2 to 1/ε in factor.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Gibbons, P., Matias, Y., Szegedy, M.: Tracking join and self-join sizes in limited storage. In: Proceedings of the Eighteenth ACM Symposium on Principles of Database Systems (PODS 1999), pp. 10–20 (1999)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pp. 20–29 (1996); Journal version in Journal of Computer and System Sciences 58, 137–147 (1999)

    Google Scholar 

  3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)

    Google Scholar 

  4. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: Tracking most frequent items dynamically. In: Proceedings of ACM Principles of Database Systems, pp. 296–306 (2003)

    Google Scholar 

  6. Cormode, G., Muthukrishnan, S.: What’s new: Finding significant differences in network data streams. In: Proceedings of IEEE Infocom (2004)

    Google Scholar 

  7. Estan, C., Varghese, G.: Data streaming in computer networks. In: Proceedings of Workshop on Management and Processing of Data Streams (2003), http://www.research.att.com/conf/mpds2003/schedule/estanV.ps

  8. Flajolet, P., Martin, G.N.: Probabilistic counting. In: 24th Annual Symposium on Foundations of Computer Science, pp. 76–82 (1983); Journal version in Journal of Computer and System Sciences 31, 182–209 (1985)

    Google Scholar 

  9. Garofalakis, M., Gehrke, J., Rastogi, R.: Querying and mining data streams:You only get one look. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2002)

    Google Scholar 

  10. Gibbons, P., Matias, Y.: Synopsis structures for massive data sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, A (1999)

    Google Scholar 

  11. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast, smallspace algorithms for approximate histogram maintenance. In: Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 389–398 (2002)

    Google Scholar 

  12. Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Surfing wavelets on streams: Onepass summaries for approximate aggregate queries. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 79–88 (2001); Journal version in IEEE Transactions on Knowledge and Data Engineering 15(3), 541–554 (2003)

    Google Scholar 

  13. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to summarize the universe: Dynamic maintenance of quantiles. In: Proceedings of 28th International Conference on Very Large Data Bases, pp. 454–465 (2002)

    Google Scholar 

  14. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. SIGMOD Record (ACM Special Interest Group on Management of Data) 30(2), 58–66 (2001)

    Google Scholar 

  15. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    MATH  Google Scholar 

  16. Muthukrishnan, S.: Data streams: Algorithms and applications. In: ACM-SIAM Symposium on Discrete Algorithms (2003), http://athos.rutgers.edu/~muthu/stream-1-1.ps

  17. Woodruff, D.: Optimal space lower bounds for all frequency moments. In: ACM-SIAM Symposium on Discrete Algorithms (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cormode, G., Muthukrishnan, S. (2004). An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. In: Farach-Colton, M. (eds) LATIN 2004: Theoretical Informatics. LATIN 2004. Lecture Notes in Computer Science, vol 2976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24698-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24698-5_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21258-4

  • Online ISBN: 978-3-540-24698-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics