Abstract
The problem of estimating the k th frequency moment F k for any non-negative k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1,2]. The space complexity of their algorithm is \(\tilde{O}(n^{1-\frac{1}{k}})\). For k > 2, their technique does not apply to data streams with arbitrary insertions and deletions. In this paper, we present an algorithm for estimating F k for k > 2, over general update streams whose space complexity is \(\tilde{O}(n^{1-\frac{1}{k-1}})\) and time complexity of processing each stream update is \(\tilde{O}(1)\).
Recently, an algorithm for estimating F k over general update streams with similar space complexity has been published by Coppersmith and Kumar [7]. Our technique is, (a) basically different from the technique used by [7], (b) is simpler and symmetric, and, (c) is more efficient in terms of the time required to process a stream update \((\tilde{O}(1)\) compared with \(\tilde{O}(n^{1-\frac{1}{k-1}})\)).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing STOC 1996, Philadelphia, Pennsylvania, May 1996, pp. 20–29 (1996)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating frequency moments. Journal of Computer Systems and Sciences 58(1), 137–147 (1998)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: Proceedings of the 34th ACM Symposium on Theory of Computing, STOC 2002, pp. 209–218. Princeton, NJ (2002)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, p. 1. Springer, Heidelberg (2002)
Chakrabarti, A., Khot, S., Sun, X.: Near-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness. In: Proceedings of the 18th Annual IEEE Conference on Computational Complexity, CCC 2003, Aarhus, Denmark (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata Languages and Programming (2002)
Coppersmith, D., Kumar, R.: An improved data stream algorithm for estimating frequency moments. In: Proceedings of the Fifteenth ACM SIAM Symposium on Discrete Algorithms, New Orleans, LA (2004)
Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: Proceedings of the Twentysecond ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Diego, California (May 2003)
Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An Approximate L1-Difference Algorithm for Massive Data Streams. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, New York, NY (October 1999)
Flajolet, P., Martin, G.N.: Probabilistic Counting Algorithms for Database Applications. Journal of Computer Systems and Sciences 31(2), 182–209 (1985)
Ganguly, S.: A bifocal technique for estimating frequency moments over data streams (April 2004) (manuscript)
Ganguly, S., Garofalakis, M., Rastogi, R.: Processing Set Expressions over Continuous Update Streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA (2003)
Indyk, P.: Stable Distributions, Pseudo Random Generators, Embeddings and Data Stream Computation. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science, Redondo Beach, CA, November 2000, pp. 189–197 (2000)
Saks, M., Sun, X.: Space lower bounds for distance approximation in the data stream model. In: Proceedings of the 34th ACM Symposium on Theory of Computing, STOC 2002 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ganguly, S. (2004). Estimating Frequency Moments of Data Streams Using Random Linear Combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2004 2004. Lecture Notes in Computer Science, vol 3122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27821-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-27821-4_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22894-3
Online ISBN: 978-3-540-27821-4
eBook Packages: Springer Book Archive