Synonyms
Distance between streams; Datastream distance
Definition
In many applications, it is useful to think of a datastream as representing a vector or a point in space. Given two datastreams, along with a distance or similarity measure, the distance (or similarity) between the two streams is simply the distance (respectively, similarity) between the two points that the datastreams represent. Due to the enormous amount of data being processed, datastream algorithms are allowed just a single, sequential pass over the data; in some settings, the algorithm may take a few passes. The algorithm itself must use very little memory, typically polylogarithmic in the amount of data, but is allowed to return approximate answers.
There are two frequently used datastream models. In the time series model, a vector, \( \overrightarrow{x} \), is simply represented as data items arriving in order of their indices: x1 , x2 , x3 , …. That is, the value of the ith item of the stream is precisely the...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Alon N, Gibbons P, Matias Y, Szegedy M. Tracking join and self-join sizes in limited storage. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1999. p. 10–20.
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th ACM Symposium on Theory of Computing; 1996. p. 20–9.
Broder A, Charikar M, Frieze A, Mitzenmacher M. Min-wise independent permutations. In: Proceedings of the 30th ACM Symposium on Theory of Computing; 1998. p. 327–36.
Chambers JM, Mallows CL, Stuck BW. A method for simulating stable random variables. J Am Stat Assoc. 1976;71(354):340–4.
Cohen E. Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci. 1997;55(3):441–53.
Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J. Finding interesting associations without support pruning. In: Proceedings of the 16th International Conference on Data Engineering; 2000.
Cormode G, Datar M, Indyk P, Muthukrishnan S. Comparing data streams using hamming norms. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 335–45.
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms; 2002. p. 635–44.
Datar M, Muthukrishnan S. Estimating rarity and similarity on data stream windows. In: Proceedings of the 10th European Symposium on Algorithms; 2002.
Feigenbaum J, Kannan S, Strauss M, Viswanathan M. An approximate l1-difference algorithm for massive data streams. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science; 1999.
Flajolet P, Martin G. Probabilistic counting. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 76–82.
Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science; 2000. p. 189–97.
Indyk P. A small approximately min-wise independent family of hash functions. J Algorithm. 2001;38(1):84–90.
On the distributional complexity of disjointness. J Comput Sci Syst. 1984;2.
Saks M, Sun X. The space complexity of approximating the frequency moments. In: Proceedings of the 34th ACM Symposium on Theory of Computing; 2002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Vee, E. (2018). Stream Similarity Mining. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_373
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_373
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering