Abstract
Several recent papers have shown how to approximate the difference Σi |a i − b i| or Σ |a i − b i|2 between two functions, when the function values a i and b i are given in a data stream, and their order is chosen by an adversary. These algorithms use little space (much less than would be needed to store the entire stream) and little time to process each item in the stream and give approximations with small relative error. Using different techniques, we show how to approximate the L p-difference Σi |a i − b i|p for any rational-valued p ∈ (0,2], with comparable efficiency and error. We also show how to approximate Σi |a i − b i|p for larger values of p but with a worse error guarantee. These results can be used to assess the difference between two chronologically or physically separated massive data sets, making one quick pass over each data set, without buffering the data or requiring the data source to pause.
Part of this work was done while the first author was visiting AT&T Labs.
An expanded version of this paper is available in preprint form at http://www.research.att.com/~mstrauss/pubs/lp.ps
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. Tracking Join and Self-Join Sizes in Limited Storage. In Proc. of the 18’th Symp. on Principles of Database Systems, ACM Press, New York, pages 10–20, 1999.
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. In Proc. of 28’th STOC, pages 20–29, 1996. To appear in Journal of Computing and System Sciences.
N. Alon and J. Spencer. The Probabilistic Method. Wiley, 1992.
A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. In Proc. of the 30’th STOC, pages 327–336, 1998.
Cisco NetFlow, 1998. http://www.cisco.com/warp/public/732/netflow/.
J. Feigenbaum. Locally random reductions in interactive complexity theory. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, vol. 13, pages 73–98. American Mathematical Society, Providence, 1993.
J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An Approximate L 1-Difference Algorithm for Massive Data Streams. To appear in Proc. of the 40’th IEEE Symposium on Foundataions of Computer Science, 1999.
J. Feigenbaum and M. Strauss. An Information-Theoretic Treatment of Random-Self-Reducibility. Proc. of the 14’th Symposium on Theoretical Aspects of Computer Science, pages 523–534. Lecture Notes in Computer Science, vol. 1200, Springer-Verlag, New York, 1997.
P. Gibbons and Y. Matias. Synopsis Data Structures for Massive Data Sets. To appear in Proc. 1998 DIMACS Workshop on External Memory Algorithms. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, American Mathematical Society, Providence. Abstract in Proc. Tenth Symposium on Discrete Algorithms, ACM Press, New York and Society for Industrial and Applied Mathematics, Philadelphia, pages S909–910, 1999.
M. Rauch Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. Technical Report 1998-011, Digital Equipment Corporation Systems Research Center, May 1998.
E. Kushilevitz, R. Ostrovsky, Y. Rabani. Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces. Proc. of The 30’s ACM Symposium on Theory of Computing, ACM Press, New York, pages 514–523.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fong, J.H., Strauss, M.J. (2000). An Approximate L p-Difference Algorithm for Massive Data Streams. In: Reichel, H., Tison, S. (eds) STACS 2000. STACS 2000. Lecture Notes in Computer Science, vol 1770. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46541-3_16
Download citation
DOI: https://doi.org/10.1007/3-540-46541-3_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67141-1
Online ISBN: 978-3-540-46541-6
eBook Packages: Springer Book Archive