Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Stream Similarity Mining

  • Erik Vee
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_373

Synonyms

Distance between streams; Datastream distance

Definition

In many applications, it is useful to think of a datastream as representing a vector or a point in space. Given two datastreams, along with a distance or similarity measure, the distance (or similarity) between the two streams is simply the distance (respectively, similarity) between the two points that the datastreams represent. Due to the enormous amount of data being processed, datastream algorithms are allowed just a single, sequential pass over the data; in some settings, the algorithm may take a few passes. The algorithm itself must use very little memory, typically polylogarithmic in the amount of data, but is allowed to return approximate answers.

There are two frequently used datastream models. In the time series model, a vector, \( \overrightarrow{x} \)

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Alon N, Gibbons P, Matias Y, Szegedy M. Tracking join and self-join sizes in limited storage. In: Proceedings of the 18th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 1999. p. 10–20.Google Scholar
  2. 2.
    Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th ACM Symposium on Theory of Computing; 1996. p. 20–9.Google Scholar
  3. 3.
    Broder A, Charikar M, Frieze A, Mitzenmacher M. Min-wise independent permutations. In: Proceedings of the 30th ACM Symposium on Theory of Computing; 1998. p. 327–36.Google Scholar
  4. 4.
    Chambers JM, Mallows CL, Stuck BW. A method for simulating stable random variables. J Am Stat Assoc. 1976;71(354):340–4.MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Cohen E. Size-estimation framework with applications to transitive closure and reachability. J Comput Syst Sci. 1997;55(3):441–53.MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Cohen E, Datar M, Fujiwara S, Gionis A, Indyk P, Motwani R, Ullman J. Finding interesting associations without support pruning. In: Proceedings of the 16th International Conference on Data Engineering; 2000.Google Scholar
  7. 7.
    Cormode G, Datar M, Indyk P, Muthukrishnan S. Comparing data streams using hamming norms. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 335–45.Google Scholar
  8. 8.
    Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms; 2002. p. 635–44.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Datar M, Muthukrishnan S. Estimating rarity and similarity on data stream windows. In: Proceedings of the 10th European Symposium on Algorithms; 2002.Google Scholar
  10. 10.
    Feigenbaum J, Kannan S, Strauss M, Viswanathan M. An approximate l1-difference algorithm for massive data streams. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science; 1999.Google Scholar
  11. 11.
    Flajolet P, Martin G. Probabilistic counting. In: Proceedings of the 24th Annual Symposium on Foundations of Computer Science; 1983. p. 76–82.Google Scholar
  12. 12.
    Indyk P. Stable distributions, pseudorandom generators, embeddings and data stream computation. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science; 2000. p. 189–97.Google Scholar
  13. 13.
    Indyk P. A small approximately min-wise independent family of hash functions. J Algorithm. 2001;38(1):84–90.MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    On the distributional complexity of disjointness. J Comput Sci Syst. 1984;2.Google Scholar
  15. 15.
    Saks M, Sun X. The space complexity of approximating the frequency moments. In: Proceedings of the 34th ACM Symposium on Theory of Computing; 2002.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Yahoo! ResearchSilicon ValleyUSA

Section editors and affiliations

  • Divesh Srivastava
    • 1
  1. 1.AT&T Labs - ResearchAT&TBedminsterUSA