Abstract
We present deterministic sub-linear space algorithms for problems over update data streams, including, estimating frequencies of items and ranges, finding approximate frequent items and approximate φ-quantiles, estimating inner-products, constructing near-optimal B-bucket histograms and estimating entropy. We also present improved lower bound results for several problems over update data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alon, N., Gibbons, P.B., Matias, Y., Szegedy, M.: Tracking Join and Self-Join Sizes in Limited Storage. In: Proc. ACM PODS, ACM Press, New York (1999)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proc. ACM PODS, ACM Press, New York (2002)
Bhuvanagiri, L., Ganguly, S.: Estimating Entropy over Data Streams. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 148–159. Springer, Heidelberg (2006)
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for Frequency Estimation of Packet Streams. In: SIROCCO, pp. 33–42 (2003)
Chakrabarti, A., Cormode, G., McGregor, A.: A Near-Optimal Algorithm for Computing the Entropy of a Stream. In: Proc. ACM SODA, ACM Press, New York (2007)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 693–703. Springer, Heidelberg (2002)
Cormode, G., Garofalakis, M.: Sketching Streams Through the Net: Distributed Approximate Query Tracking. In: Proc. VLDB (September 2005)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Finding Hierarchical Heavy Hitters in Data Streams. In: Proc. VLDB (2003)
Cormode, G., Muthukrishnan, S.: What’s New: Finding Significant Differences in Network Data Streams. In: IEEE INFOCOM, IEEE Computer Society Press, Los Alamitos (2004)
Cormode, G., Muthukrishnan, S.: An Improved Data Stream Summary: The Count-Min Sketch and its Applications. J. Algorithms 55(1), 58–75 (2005)
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Demaine, E.D., López-Ortiz, A., Munro, J.I: Frequency estimation of internet packet streams with limited space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, Springer, Heidelberg (2002)
Estan, C., Savage, S., Varghese, G.: Automatically inferring patterns of resource consumption in network traffic. In: Proc. ACM SIGCOMM, pp. 137–148. ACM Press, New York (2003)
Ganguly, S., Kesh, D., Saha, C.: Practical Algorithms for Tracking Database Join Sizes. In: Ramanujam, R., Sen, S. (eds.) FSTTCS 2005. LNCS, vol. 3821, Springer, Heidelberg (2005)
Ganguly, S., Majumder, A.: Deterministic K-set Structure. In: Proc. ACM PODS, ACM Press, New York (2006)
Gibbons, P.B., Matias, Y.: New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In: Proc. ACM SIGMOD, ACM Press, New York (1998)
Gilbert, A., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.: Fast Small-space Algorithms for Approximate Histogram Maintenance. In: Proc. ACM STOC, ACM Press, New York (2002)
Gilbert, A., Kotidis, Y., Muthukrishnan, S., Strauss, M.: How to Summarize the Universe: Dynamic Maintenance of Quantiles. In: Bressan, S., Chaudhri, A.B., Lee, M.L., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, Springer, Heidelberg (2003)
Gilbert, A.C., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Surfing Wavelets on Streams: One-pass Summaries for Approximate Aggregate Queries. In: Jonker, W. (ed.) VLDB-WS 2001 and DBTel 2001. LNCS, vol. 2209, Springer, Heidelberg (2001)
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: SIGMOD (2001)
Hershberger, J., Shrivastava, N., Suri, S., Toth, C.D.: Space Complexity of Hierarchical Heavy Hitters in Multi-Dimensional Data Streams. In: Proc. ACM PODS, ACM Press, New York (2005)
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A Simple Algorithm for Finding Frequent Elements in Streams and Bags. ACM TODS 28(1), 51–55 (2003)
Manku, G., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Bressan, S., Chaudhri, A.B., Lee, M.L., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 346–357. Springer, Heidelberg (2003)
Manku, G., Rajagopalan, S., Lindsay, B.: Random sampling techniques for space efficient online computation of order statistics of large datasets. In: Proc. ACM SIGMOD, ACM Press, New York (1999)
Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Programm. 2, 143–152 (1982)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)
Rosser, J.B.: Explicit bounds on some functions on prime numbers. Amer. J. Math. 63 (1941)
Schweller, R., Li, Z., Chen, Y., Gao, Y., Gupta, A., Zhang, Y., Dinda, P., Kao, M-Y., Memik, G.: Monitoring Flow-level High-speed Data Streams with Reversible Sketches. In: IEEE INFOCOM, IEEE Computer Society Press, Los Alamitos (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ganguly, S., Majumder, A. (2007). CR-precis: A Deterministic Summary Structure for Update Data Streams. In: Chen, B., Paterson, M., Zhang, G. (eds) Combinatorics, Algorithms, Probabilistic and Experimental Methodologies. ESCAPE 2007. Lecture Notes in Computer Science, vol 4614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74450-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-74450-4_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74449-8
Online ISBN: 978-3-540-74450-4
eBook Packages: Computer ScienceComputer Science (R0)