Skip to main content

Tracking Queries over Distributed Streams

  • Chapter
  • First Online:
  • 3440 Accesses

Part of the book series: Data-Centric Systems and Applications ((DCSA))

Abstract

Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this chapter, we provide a brief introduction to the distributed data streaming model and the Geometric Method (GM), a generic technique for effectively tracking complex queries over massive distributed streams. We also discuss several recently-proposed extensions to the basic GM framework, such as the combination with stream-sketching tools and local prediction models, as well as more recent developments leading to a more general theory of Safe Zones and interesting connections to convex Euclidean geometry.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. N. Alon, P.B. Gibbons, Y. Matias, M. Szegedy, Tracking join and self-join sizes in limited storage, in Proc. of the 18th ACM Symposium on Principles of Database Systems, Philadelphia, Pennsylvania (1999)

    Google Scholar 

  2. N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in Proc. of the 28th Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania (1996), pp. 20–29

    Google Scholar 

  3. B. Babcock, C. Olston, Distributed top-\(k\) monitoring, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)

    Google Scholar 

  4. S. Burdakis, A. Deligiannakis, Detecting outliers in sensor networks using the geometric approach, in Proc. of the 28th Intl. Conference on Data Engineering (2012)

    Google Scholar 

  5. M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams, in Proc. of the Intl. Colloquium on Automata, Languages, and Programming, Malaga, Spain (2002)

    Google Scholar 

  6. G. Cormode, M. Garofalakis, Streaming in a connected world: querying and tracking distributed data streams, in 2007 ACM SIGMOD Intl Conf. on Management of Data, Beijing, China (2007). Tutorial

    Google Scholar 

  7. G. Cormode, M. Garofalakis, Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33(2) (2008)

    Google Scholar 

  8. G. Cormode, M. Garofalakis, P.J. Haas, C. Jermaine, Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends® Databases 4(1–3) (2012)

    Google Scholar 

  9. G. Cormode, M. Garofalakis, S. Muthukrishnan, R. Rastogi, Holistic aggregates in a networked world: distributed tracking of approximate quantiles, in Proc. of the 2005 ACM SIGMOD Intl. Conference on Management of Data, Baltimore, Maryland (2005)

    Google Scholar 

  10. G. Cormode, M. Garofalakis, D. Sacharidis, Fast approximate wavelet tracking on streams, in Proc. of the 10th Intl. Conference on Extending Database Technology (EDBT’2006), Munich, Germany (2006)

    Google Scholar 

  11. G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, in Proc. of the 22nd ACM Symposium on Principles of Database Systems, San Diego, California (2003), pp. 296–306

    Google Scholar 

  12. G. Cormode, S. Muthukrishnan, K. Yi, Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2) (2011)

    Google Scholar 

  13. C. Cranor, T. Johnson, O. Spatscheck, V. Shkapenyuk, Gigascope: a stream database for network applications, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)

    Google Scholar 

  14. A. Das, S. Ganguly, M. Garofalakis, R. Rastogi, Distributed set-expression cardinality estimation, in Proc. of the 30th Intl. Conference on Very Large Data Bases, Toronto, Canada (2004)

    Google Scholar 

  15. A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi, Processing complex aggregate queries over data streams, in Proc. of the 2002 ACM SIGMOD Intl. Conference on Management of Data, Madison, Wisconsin (2002), pp. 61–72

    Chapter  Google Scholar 

  16. S. Ganguly, M. Garofalakis, R. Rastogi, Processing set expressions over continuous update streams, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)

    Google Scholar 

  17. M. Garofalakis, D. Keren, V. Samoladas, Sketch-based geometric monitoring of distributed stream queries, in Proc. of the 39th Intl. Conference on Very Large Data Bases, Trento, Italy (2013)

    Google Scholar 

  18. N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, A. Schuster, Prediction-based geometric monitoring of distributed data streams, in Proc. of the 2012 ACM SIGMOD Intl. Conference on Management of Data, Scottsdale, Arizona (2012)

    Google Scholar 

  19. N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, A. Schuster, Distributed geometric query monitoring using prediction models. ACM Trans. Database Syst. 39(2) (2014)

    Google Scholar 

  20. P.B. Gibbons, Distinct sampling for highly-accurate answers to distinct values queries and event reports, in Proc. of the 27th Intl. Conference on Very Large Data Bases, Rome, Italy (2001)

    Google Scholar 

  21. M.B. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries, in Proc. of the 2001 ACM SIGMOD Intl. Conference on Management of Data, Santa Barbara, California (2001)

    Google Scholar 

  22. Z. Huang, K. Yi, Q. Zhang, Randomized algorithms for tracking distributed count, frequencies, and ranks, in Proc. of the 31st ACM Symposium on Principles of Database Systems (2012)

    Google Scholar 

  23. R. Keralapura, G. Cormode, J. Ramamirtham, Communication-efficient distributed monitoring of thresholded counts, in Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois (2006), pp. 289–300

    Chapter  Google Scholar 

  24. D. Keren, I. Sharfman, A. Schuster, A. Livne, Shape-sensitive geometric monitoring. IEEE Trans. Knowl. Data Eng. 24(8) (2012)

    Google Scholar 

  25. E. Kushilevitz, N. Nisan, Communication Complexity (Cambridge University Press, Cambridge, 1997)

    Book  MATH  Google Scholar 

  26. A. Lazerson, I. Sharfman, D. Keren, A. Schuster, M. Garofalakis, V. Samoladas, Monitoring distributed streams using convex decompositions, in Proc. of the 41st Intl. Conference on Very Large Data Bases (2015)

    Google Scholar 

  27. S.R. Madden, M.J. Franklin, J.M. Hellerstein, W. Hong, The design of an acquisitional query processor for sensor networks, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)

    Google Scholar 

  28. G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China (2002), pp. 346–357

    Chapter  Google Scholar 

  29. S. Muthukrishnan, Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2) (2005)

    Google Scholar 

  30. NII Shonan workshop on large-scale distributed computation, Shonan Village, Japan, January (2012). http://www.nii.ac.jp/shonan/seminar011/

  31. C. Olston, J. Jiang, J. Widom, Adaptive filters for continuous queries over distributed data streams, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)

    Google Scholar 

  32. O. Papapetrou, M. Garofalakis, Continuous fragmented skylines over distributed streams, in Proc. of the 30th Intl. Conference on Data Engineering, Chicago, Illinois (2014)

    Google Scholar 

  33. I. Sharfman, A. Schuster, D. Keren, A geometric approach to monitoring threshold functions over distributed data streams, in Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois (2006), pp. 301–312

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minos Garofalakis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Garofalakis, M. (2016). Tracking Queries over Distributed Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28608-0_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28607-3

  • Online ISBN: 978-3-540-28608-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics