Skip to main content

CR-precis: A Deterministic Summary Structure for Update Data Streams

  • Conference paper
Combinatorics, Algorithms, Probabilistic and Experimental Methodologies (ESCAPE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4614))

Abstract

We present deterministic sub-linear space algorithms for problems over update data streams, including, estimating frequencies of items and ranges, finding approximate frequent items and approximate φ-quantiles, estimating inner-products, constructing near-optimal B-bucket histograms and estimating entropy. We also present improved lower bound results for several problems over update data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-Join Sizes in Limited Storage. In: Proc. ACM PODS, ACM Press, New York (1999)

    Google Scholar 

  2. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proc. ACM PODS, ACM Press, New York (2002)

    Google Scholar 

  3. Bhuvanagiri, L., Ganguly, S.: Estimating Entropy over Data Streams. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 148–159. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for Frequency Estimation of Packet Streams. In: SIROCCO, pp. 33–42 (2003)

    Google Scholar 

  5. Chakrabarti, A., Cormode, G., McGregor, A.: A Near-Optimal Algorithm for Computing the Entropy of a Stream. In: Proc. ACM SODA, ACM Press, New York (2007)

    Google Scholar 

  6. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  7. Cormode, G., Garofalakis, M.: Sketching Streams Through the Net: Distributed Approximate Query Tracking. In: Proc. VLDB (September 2005)

    Google Scholar 

  8. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding Hierarchical Heavy Hitters in Data Streams. In: Proc. VLDB (2003)

    Google Scholar 

  9. Cormode, G., Muthukrishnan, S.: What’s New: Finding Significant Differences in Network Data Streams. In: IEEE INFOCOM, IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  10. Cormode, G., Muthukrishnan, S.: An Improved Data Stream Summary: The Count-Min Sketch and its Applications. J. Algorithms 55(1), 58–75 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  11. Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)

    Article  MathSciNet  Google Scholar 

  12. Demaine, E.D., López-Ortiz, A., Munro, J.I: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Estan, C., Savage, S., Varghese, G.: Automatically inferring patterns of resource consumption in network traffic. In: Proc. ACM SIGCOMM, pp. 137–148. ACM Press, New York (2003)

    Google Scholar 

  14. Ganguly, S., Kesh, D., Saha, C.: Practical Algorithms for Tracking Database Join Sizes. In: Ramanujam, R., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  15. Ganguly, S., Majumder, A.: Deterministic K-set Structure. In: Proc. ACM PODS, ACM Press, New York (2006)

    Google Scholar 

  16. Gibbons, P.B., Matias, Y.: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In: Proc. ACM SIGMOD, ACM Press, New York (1998)

    Google Scholar 

  17. Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast Small-space Algorithms for Approximate Histogram Maintenance. In: Proc. ACM STOC, ACM Press, New York (2002)

    Google Scholar 

  18. Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: Bressan, S., Chaudhri, A.B., Lee, M.L., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, Springer, Heidelberg (2003)

    Google Scholar 

  19. Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Jonker, W. (ed.) VLDB-WS 2001 and DBTel 2001. LNCS, vol. 2209, Springer, Heidelberg (2001)

    Google Scholar 

  20. Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD (2001)

    Google Scholar 

  21. Hershberger, J., Shrivastava, N., Suri, S., Toth, C.D.: Space Complexity of Hierarchical Heavy Hitters in Multi-Dimensional Data Streams. In: Proc. ACM PODS, ACM Press, New York (2005)

    Google Scholar 

  22. Karp, R.M., Shenker, S., Papadimitriou, C.H.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM TODS 28(1), 51–55 (2003)

    Article  Google Scholar 

  23. Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Bressan, S., Chaudhri, A.B., Lee, M.L., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 346–357. Springer, Heidelberg (2003)

    Google Scholar 

  24. Manku, G., Rajagopalan, S., Lindsay, B.: Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proc. ACM SIGMOD, ACM Press, New York (1999)

    Google Scholar 

  25. Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Programm. 2, 143–152 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  26. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)

    Google Scholar 

  27. Rosser, J.B.: Explicit bounds on some functions on prime numbers. Amer. J. Math. 63 (1941)

    Google Scholar 

  28. Schweller, R., Li, Z., Chen, Y., Gao, Y., Gupta, A., Zhang, Y., Dinda, P., Kao, M-Y., Memik, G.: Monitoring Flow-level High-speed Data Streams with Reversible Sketches. In: IEEE INFOCOM, IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Bo Chen Mike Paterson Guochuan Zhang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ganguly, S., Majumder, A. (2007). CR-precis: A Deterministic Summary Structure for Update Data Streams. In: Chen, B., Paterson, M., Zhang, G. (eds) Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. ESCAPE 2007. Lecture Notes in Computer Science, vol 4614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74450-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74450-4_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74449-8

  • Online ISBN: 978-3-540-74450-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics