Abstract
Effective Big Data analytics pose several difficult challenges for modern data management architectures. One key such challenge arises from the naturally streaming nature of big data, which mandates efficient algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in emerging large-scale event monitoring applications; for instance, network-operations monitoring in large ISPs, where usage information from numerous sites needs to be continuously collected and analyzed for interesting trends. In addition to memory- and time-efficiency concerns, the inherently distributed nature of such applications also raises important communication-efficiency issues, making it critical to carefully optimize the use of the underlying network infrastructure. In this chapter, we provide a brief introduction to the distributed data streaming model and the Geometric Method (GM), a generic technique for effectively tracking complex queries over massive distributed streams. We also discuss several recently-proposed extensions to the basic GM framework, such as the combination with stream-sketching tools and local prediction models, as well as more recent developments leading to a more general theory of Safe Zones and interesting connections to convex Euclidean geometry.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
N. Alon, P.B. Gibbons, Y. Matias, M. Szegedy, Tracking join and self-join sizes in limited storage, in Proc. of the 18th ACM Symposium on Principles of Database Systems, Philadelphia, Pennsylvania (1999)
N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments, in Proc. of the 28th Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania (1996), pp. 20–29
B. Babcock, C. Olston, Distributed top-\(k\) monitoring, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)
S. Burdakis, A. Deligiannakis, Detecting outliers in sensor networks using the geometric approach, in Proc. of the 28th Intl. Conference on Data Engineering (2012)
M. Charikar, K. Chen, M. Farach-Colton, Finding frequent items in data streams, in Proc. of the Intl. Colloquium on Automata, Languages, and Programming, Malaga, Spain (2002)
G. Cormode, M. Garofalakis, Streaming in a connected world: querying and tracking distributed data streams, in 2007 ACM SIGMOD Intl Conf. on Management of Data, Beijing, China (2007). Tutorial
G. Cormode, M. Garofalakis, Approximate continuous querying over distributed streams. ACM Trans. Database Syst. 33(2) (2008)
G. Cormode, M. Garofalakis, P.J. Haas, C. Jermaine, Synopses for massive data: samples, histograms, wavelets, sketches. Found. Trends® Databases 4(1–3) (2012)
G. Cormode, M. Garofalakis, S. Muthukrishnan, R. Rastogi, Holistic aggregates in a networked world: distributed tracking of approximate quantiles, in Proc. of the 2005 ACM SIGMOD Intl. Conference on Management of Data, Baltimore, Maryland (2005)
G. Cormode, M. Garofalakis, D. Sacharidis, Fast approximate wavelet tracking on streams, in Proc. of the 10th Intl. Conference on Extending Database Technology (EDBT’2006), Munich, Germany (2006)
G. Cormode, S. Muthukrishnan, What’s hot and what’s not: tracking most frequent items dynamically, in Proc. of the 22nd ACM Symposium on Principles of Database Systems, San Diego, California (2003), pp. 296–306
G. Cormode, S. Muthukrishnan, K. Yi, Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2) (2011)
C. Cranor, T. Johnson, O. Spatscheck, V. Shkapenyuk, Gigascope: a stream database for network applications, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)
A. Das, S. Ganguly, M. Garofalakis, R. Rastogi, Distributed set-expression cardinality estimation, in Proc. of the 30th Intl. Conference on Very Large Data Bases, Toronto, Canada (2004)
A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi, Processing complex aggregate queries over data streams, in Proc. of the 2002 ACM SIGMOD Intl. Conference on Management of Data, Madison, Wisconsin (2002), pp. 61–72
S. Ganguly, M. Garofalakis, R. Rastogi, Processing set expressions over continuous update streams, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)
M. Garofalakis, D. Keren, V. Samoladas, Sketch-based geometric monitoring of distributed stream queries, in Proc. of the 39th Intl. Conference on Very Large Data Bases, Trento, Italy (2013)
N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, A. Schuster, Prediction-based geometric monitoring of distributed data streams, in Proc. of the 2012 ACM SIGMOD Intl. Conference on Management of Data, Scottsdale, Arizona (2012)
N. Giatrakos, A. Deligiannakis, M. Garofalakis, I. Sharfman, A. Schuster, Distributed geometric query monitoring using prediction models. ACM Trans. Database Syst. 39(2) (2014)
P.B. Gibbons, Distinct sampling for highly-accurate answers to distinct values queries and event reports, in Proc. of the 27th Intl. Conference on Very Large Data Bases, Rome, Italy (2001)
M.B. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries, in Proc. of the 2001 ACM SIGMOD Intl. Conference on Management of Data, Santa Barbara, California (2001)
Z. Huang, K. Yi, Q. Zhang, Randomized algorithms for tracking distributed count, frequencies, and ranks, in Proc. of the 31st ACM Symposium on Principles of Database Systems (2012)
R. Keralapura, G. Cormode, J. Ramamirtham, Communication-efficient distributed monitoring of thresholded counts, in Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois (2006), pp. 289–300
D. Keren, I. Sharfman, A. Schuster, A. Livne, Shape-sensitive geometric monitoring. IEEE Trans. Knowl. Data Eng. 24(8) (2012)
E. Kushilevitz, N. Nisan, Communication Complexity (Cambridge University Press, Cambridge, 1997)
A. Lazerson, I. Sharfman, D. Keren, A. Schuster, M. Garofalakis, V. Samoladas, Monitoring distributed streams using convex decompositions, in Proc. of the 41st Intl. Conference on Very Large Data Bases (2015)
S.R. Madden, M.J. Franklin, J.M. Hellerstein, W. Hong, The design of an acquisitional query processor for sensor networks, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)
G.S. Manku, R. Motwani, Approximate frequency counts over data streams, in Proc. of the 28th Intl. Conference on Very Large Data Bases, Hong Kong, China (2002), pp. 346–357
S. Muthukrishnan, Data streams: algorithms and applications. Found. Trends Theor. Comput. Sci. 1(2) (2005)
NII Shonan workshop on large-scale distributed computation, Shonan Village, Japan, January (2012). http://www.nii.ac.jp/shonan/seminar011/
C. Olston, J. Jiang, J. Widom, Adaptive filters for continuous queries over distributed data streams, in Proc. of the 2003 ACM SIGMOD Intl. Conference on Management of Data, San Diego, California (2003)
O. Papapetrou, M. Garofalakis, Continuous fragmented skylines over distributed streams, in Proc. of the 30th Intl. Conference on Data Engineering, Chicago, Illinois (2014)
I. Sharfman, A. Schuster, D. Keren, A geometric approach to monitoring threshold functions over distributed data streams, in Proc. of the 2006 ACM SIGMOD Intl. Conference on Management of Data, Chicago, Illinois (2006), pp. 301–312
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Garofalakis, M. (2016). Tracking Queries over Distributed Streams. In: Garofalakis, M., Gehrke, J., Rastogi, R. (eds) Data Stream Management. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28608-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-28608-0_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28607-3
Online ISBN: 978-3-540-28608-0
eBook Packages: Computer ScienceComputer Science (R0)