Skip to main content

Estimating Frequency Moments of Data Streams Using Random Linear Combinations

  • Conference paper
Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (RANDOM 2004, APPROX 2004)

Abstract

The problem of estimating the k th frequency moment F k for any non-negative k, over a data stream by looking at the items exactly once as they arrive, was considered in a seminal paper by Alon, Matias and Szegedy [1,2]. The space complexity of their algorithm is \(\tilde{O}(n^{1-\frac{1}{k}})\). For k > 2, their technique does not apply to data streams with arbitrary insertions and deletions. In this paper, we present an algorithm for estimating F k for k > 2, over general update streams whose space complexity is \(\tilde{O}(n^{1-\frac{1}{k-1}})\) and time complexity of processing each stream update is \(\tilde{O}(1)\).

Recently, an algorithm for estimating F k over general update streams with similar space complexity has been published by Coppersmith and Kumar [7]. Our technique is, (a) basically different from the technique used by [7], (b) is simpler and symmetric, and, (c) is more efficient in terms of the time required to process a stream update \((\tilde{O}(1)\) compared with \(\tilde{O}(n^{1-\frac{1}{k-1}})\)).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. In: Proceedings of the 28th Annual ACM Symposium on the Theory of Computing STOC 1996, Philadelphia, Pennsylvania, May 1996, pp. 20–29 (1996)

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating frequency moments. Journal of Computer Systems and Sciences 58(1), 137–147 (1998)

    Article  MathSciNet  Google Scholar 

  3. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D.: An information statistics approach to data stream and communication complexity. In: Proceedings of the 34th ACM Symposium on Theory of Computing, STOC 2002, pp. 209–218. Princeton, NJ (2002)

    Google Scholar 

  4. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, p. 1. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Chakrabarti, A., Khot, S., Sun, X.: Near-Optimal Lower Bounds on the Multi-Party Communication Complexity of Set Disjointness. In: Proceedings of the 18th Annual IEEE Conference on Computational Complexity, CCC 2003, Aarhus, Denmark (2003)

    Google Scholar 

  6. Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Proceedings of the 29th International Colloquium on Automata Languages and Programming (2002)

    Google Scholar 

  7. Coppersmith, D., Kumar, R.: An improved data stream algorithm for estimating frequency moments. In: Proceedings of the Fifteenth ACM SIAM Symposium on Discrete Algorithms, New Orleans, LA (2004)

    Google Scholar 

  8. Cormode, G., Muthukrishnan, S.: What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically. In: Proceedings of the Twentysecond ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, San Diego, California (May 2003)

    Google Scholar 

  9. Feigenbaum, J., Kannan, S., Strauss, M., Viswanathan, M.: An Approximate L1-Difference Algorithm for Massive Data Streams. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, New York, NY (October 1999)

    Google Scholar 

  10. Flajolet, P., Martin, G.N.: Probabilistic Counting Algorithms for Database Applications. Journal of Computer Systems and Sciences 31(2), 182–209 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  11. Ganguly, S.: A bifocal technique for estimating frequency moments over data streams (April 2004) (manuscript)

    Google Scholar 

  12. Ganguly, S., Garofalakis, M., Rastogi, R.: Processing Set Expressions over Continuous Update Streams. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA (2003)

    Google Scholar 

  13. Indyk, P.: Stable Distributions, Pseudo Random Generators, Embeddings and Data Stream Computation. In: Proceedings of the 41st Annual IEEE Symposium on Foundations of Computer Science, Redondo Beach, CA, November 2000, pp. 189–197 (2000)

    Google Scholar 

  14. Saks, M., Sun, X.: Space lower bounds for distance approximation in the data stream model. In: Proceedings of the 34th ACM Symposium on Theory of Computing, STOC 2002 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ganguly, S. (2004). Estimating Frequency Moments of Data Streams Using Random Linear Combinations. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. RANDOM APPROX 2004 2004. Lecture Notes in Computer Science, vol 3122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27821-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27821-4_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22894-3

  • Online ISBN: 978-3-540-27821-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics