Skip to main content

An Approximate L p-Difference Algorithm for Massive Data Streams

Extended Abstract

  • Conference paper
  • First Online:
STACS 2000 (STACS 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1770))

Included in the following conference series:

Abstract

Several recent papers have shown how to approximate the difference Σi |a ib i| or Σ |a ib i|2 between two functions, when the function values a i and b i are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream and give approximations with small relative error. Using different techniques, we show how to approximate the L p-difference Σi |a ib i|p for any rational-valued p ∈ (0,2], with comparable efficiency and error. We also show how to approximate Σi |a ib i|p for larger values of p but with a worse error guarantee. These results can be used to assess the difference between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buffering the data or requiring the data source to pause.

Part of this work was done while the first author was visiting AT&T Labs.

An expanded version of this paper is available in preprint form at http://www.research.att.com/~mstrauss/pubs/lp.ps

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. Tracking Join and Self-Join Sizes in Limited Storage. In Proc. of the 18’th Symp. on Principles of Database Systems, ACM Press, New York, pages 10–20, 1999.

    Google Scholar 

  2. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of 28’th STOC, pages 20–29, 1996. To appear in Journal of Computing and System Sciences.

    Google Scholar 

  3. N. Alon and J. Spencer. The Probabilistic Method. Wiley, 1992.

    Google Scholar 

  4. A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. of the 30’th STOC, pages 327–336, 1998.

    Google Scholar 

  5. Cisco NetFlow, 1998. http://www.cisco.com/warp/public/732/netflow/.

  6. J. Feigenbaum. Locally random reductions in interactive complexity theory. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 13, pages 73–98. American Mathematical Society, Providence, 1993.

    Google Scholar 

  7. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An Approximate L 1-Difference Algorithm for Massive Data Streams. To appear in Proc. of the 40’th IEEE Symposium on Foundataions of Computer Science, 1999.

    Google Scholar 

  8. J. Feigenbaum and M. Strauss. An Information-Theoretic Treatment of Random-Self-Reducibility. Proc. of the 14’th Symposium on Theoretical Aspects of Computer Science, pages 523–534. Lecture Notes in Computer Science, vol. 1200, Springer-Verlag, New York, 1997.

    Google Scholar 

  9. P. Gibbons and Y. Matias. Synopsis Data Structures for Massive Data Sets. To appear in Proc. 1998 DIMACS Workshop on External Memory Algorithms. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Providence. Abstract in Proc. Tenth Symposium on Discrete Algorithms, ACM Press, New York and Society for Industrial and Applied Mathematics, Philadelphia, pages S909–910, 1999.

    Google Scholar 

  10. M. Rauch Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation Systems Research Center, May 1998.

    Google Scholar 

  11. E. Kushilevitz, R. Ostrovsky, Y. Rabani. Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. Proc. of The 30’s ACM Symposium on Theory of Computing, ACM Press, New York, pages 514–523.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fong, J.H., Strauss, M.J. (2000). An Approximate L p-Difference Algorithm for Massive Data Streams. In: Reichel, H., Tison, S. (eds) STACS 2000. STACS 2000. Lecture Notes in Computer Science, vol 1770. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46541-3_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-46541-3_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67141-1

  • Online ISBN: 978-3-540-46541-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics