Stream processing applications require the processing and analysis of continuously generated multimodal and distributed data streams. This requires a unique combination of multiple features that distinguishes streaming analytics from traditional data analysis paradigms, which are often batch and offline. These features can be summarized as follows:
In-Motion Analysis: Streaming analytics need to process data on-the-fly, as it continues to flow, in order to support real-time, low-latency analysis and to match the computation to the naturally streaming properties of the data. This limits the amount of prior data that can be accessed and necessitates one-pass, online algorithms. Several streaming algorithms are described in [7, 1].
Distributed Analysis: Data streams are often distributed, and/or high volume, and their large rates make it infeasible to adopt centralized solutions. Hence, the applications and analytic algorithms themselves need to be distributed.
High-Performance Analysis:...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Aggarwal C (ed). Data streams: models and algorithms. Boston: Springer; 2007.
Aggarwal CC, Han J, Wang J, Yu PS. A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases; 2003. p. 81–92.
Aggarwal CC, Han J, Wang J, Yu PS. A framework for high dimensional projected clustering of data streams. In: Proceedings of the 30th International Conference on Very Large Data Bases; 2004. p. 852–63.
Aggarwal CC, Han J, Wang J, Yu PS. On demand classification of data streams. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2004. p. 503–8.
Aggarwal CC, Yu PS. A framework for clustering uncertain data streams. In: Proceedings of the 24th International Conference on Data Engineering; 2008. p. 150–59.
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In: Proceedings of the 28th Annual ACM Symposium on Theory of Computing; 1996. p. 20–9.
Andrade H, Gedik B, Turaga D. Fundamentals of stream processing: application design, systems, and analytics. Cambridge: Cambridge University Press; 2013.
Arasu A, Manku G. Approximate counts and quantiles over sliding windows. In: Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2004. p. 286–96.
Ardilly P, Tillé Y. Sampling methods. Springer; 2006.
Babcock B, Datar M, Motwani R, O’Callaghan L. Maintaining variance and k-medians over data stream windows. In: Proceedings of the 22nd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2003. p. 234–43.
Bagnall AJ, (Ann) Ratanamahatana C, Keogh EJ, Lonardi S, Janacek GJ. A bit level representation for time series data mining with shape based similarity. Springer Data Min Knowl Disc. 2006;13(1):11–40.
Boufounos P. Universal rate-efficient scalar quantization. IEEE Trans Inf Theory. 2012;58(3):1861–72.
Chandola V, Banerjee A, Kumar V. Anomaly detection: a survey. ACM Comput Surv. 2009;41(3).
Chang JH, Lee WS. Finding recent frequent itemsets adaptively over online data streams. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 487–92.
Cheng J, Ke Y, Ng W. A survey on algorithms for mining frequent itemsets over data streams. Knowl Inf Syst. 2008;16(1):1–27.
Chi Y, Wang H, Yu PS, Muntz RR. Moment: maintaining closed frequent itemsets over a stream sliding window. In: Proceedings of the 4th IEEE International Conference on Data Mining; 2004. p. 59–66.
Cormode G, Muthukrishnan S. An improved data stream summary: the count-min sketch and its applications. J Algorithms. 2005;55(1):58–75.
Cormode G, Garofalakis M, Haas P, Jermaine C. Synopses for massive data: samples, histograms, wavelets, sketches. Foundations and trends in databases series. Boston: Now Publishing; 2011.
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. SIAM J Comput. 2002;31(6):1794–813.
Delp E, Saenz M, Salama P. Block truncation coding. In: Al Bovik, editor. The handbook of image and video processing. Amsterdam/Boston: Academic Press; 2005. p. 661–72.
Domingos P, Hulten G. Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000. p. 71–80.
Duchi J, Hazan E, Singer Y. An improved data stream summary: the Count-Min sketch and its applications. J Mach Learn Res. 2010;12:2121–59.
Fan W, Stolfo SJ, Zhang J. The application of AdaBoost for distributed, scalable and on-line learning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 1999. p. 362–66.
Fang J, Li H. Optimal/near-optimal dimensionality reduction for distributed estimation in homogeneous and certain inhomogeneous scenarios. IEEE Trans Signal Process (TSP). 2010;58(8):4339–53.
Fox J, editor. Applied regression analysis, linear models, and related methods. Thousands Oaks: SAGE Publications; 1997.
Ganguly S, Majumder A. CR-precis: a deterministic summary structure for update data streams. In: Proceedings of the International Symposium on Combinatorics; 2007. p. 48–59.
Gardner WA. Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique. Signal Process. 1984;6(2): 113–33.
Gersho A, Gray RM. Vector quantization and signal compression. Boston: Kluwer Academic Publishers; 1991.
Giannella C, Han J, Pei J, Yan X, Yu P. Mining frequent patterns in data streams at multiple time granularities. In: Kargupta H, Joshi A, Sivakumar K, Yesha Y, editors. Data mining: next generation challenges and future directions. MIT Press; 2002. p. 105–24.
Gilbert A, Guha S, Indyk P, Kotidis Y, Muthukrishnan S, Strauss M. Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing; 2002. p. 389–98.
Gilbert A, Kotidis Y, Muthukrishnan S, Strauss M. Surfing wavelets on streams: one-pass summaries for approximate aggregate queries. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 79–88.
Goethals B. Survey on frequent pattern mining. Technical report, Helsinki institute for information technology basic research unit., 2003.
Guha S, Mishra N, Motwani R, OĆallaghan L. Clustering data streams. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science; 2000. p. 359–66.
Haipeng Z, Kulkarni SR, Poor HV. Attribute-distributed learning: models, limits, and algorithms. 2011;59(1):386–98.
Hansen LK, Salamon P. Neural network ensembles. 1990;12(10):993–1001.
Hulten G, Spencer L, Domingos P. Mining time changing data streams. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2001. p. 97–106.
Jin R, Agrawal G. An algorithm for in-core frequent itemset mining on streaming data. In: Proceedings of the 5th IEEE International Conference on Data Mining; 2005. p. 201–17.
Kamilov U, Goyal VK, Rangan S. Optimal quantization for compressive sensing under message passing reconstruction. In: Proceedings of the IEEE International Symposium on Information Theory; 2011. p. 459–63.
Karampatziakis N, Langford J. Online importance weight aware updates. In: Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence; 2011. p. 392–99.
Kira K, Rendell L. A practical approach to feature selection. In: Proceedings of the 9th International Conference on Machine Learning; 1992. p. 249–56.
Lin J, Vlachos M, Keogh E, Gunopulos D. Iterative incremental clustering of data streams. In: Advances in Database Technology, Proceedings of the 9th International Conference on Extending Database Technology; 2004. p. 106–22.
Lughofer E. Extensions of vector quantization for incremental clustering. Pattern Recogn. 2008;41(3):995–1011.
Mallat S. A wavelet tour of signal processing, the sparse way. Amsterdam: Academic Press; 2009.
Manku GS, Motwani R. Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 346–57.
Masud MM, Gao J, Khan L, Han J, Thuraisingham B. Integrating novel class detection with classification for concept-drifting data streams. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases; 2009. p. 79–94.
Mateos G, Bazerque JA, Giannakis GB. Distributed sparse linear regression. 2010;58(10):5262–76.
Matias Y, Gibbons P, Poosala V. Fast incremental maintenance of approximate histograms. In: Proceedings of the 23th International Conference on Very Large Data Bases; 1997. p. 466–75.
McMahan B, Streeter M. Adaptive bound optimization for online convex optimization. In: Proceedings of the International Conference on Learning Theory; 2010. p. 244–56.
Monemizadeh M, Woodruff DP. 1-pass relative-error lp-sampling with applications. In: Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms; 2010. p. 1143–60.
Motwani R, Chaudhuri S, Narasayya V. Random sampling for histogram construction. How much is enough? In: Proceedings of the ACM SIGMOD Workshop on the Web and Databases; 1998. p. 436–47.
Papadimitriou S, Sun J, Faloutsos C. Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st International Conference on Very Large Data Bases; 2005. p. 697–708.
Percival D, Walden A. Spectral analysis for physical applications. Cambridge: Cambridge University Press; 1993.
Pharr M, Humphreys G. Physically based rendering: from theory to implementation. Burlington: Morgan Kaufmann; 2010.
Polikar R. Ensemble based systems in decision making. IEEE Circuits Syst Mag. 2006;6(3):21–45.
Russel S, Norvig P. Artificial intelligence: a modern approach. Upper Saddle River: Prentice Hall; 2010.
Sayood K. Introduction to data compression. Morgan Kaufmann; 2005.
Schapire RE, Singer Y. Improved boosting algorithms using confidence-rated predictors. Mach Learn. 1999;37(3):297–336.
Shinozaki T, Kubota Y, Furui S. Unsupervised acoustic model adaptation based on ensemble methods. 2010;4(6):1007–15.
Sugiyama M, Kawanabe M, Chui PL. Dimensionality reduction for density ratio estimation in high-dimensional spaces. Neural Netw. 2010;23(1): 44–59.
Takezawa K, editor. Introduction to nonparametric regression. Wiley; 2005.
Towfic ZJ, Chen J, Sayed AH. On distributed online classification in the midst of concept drifts. Neurocomputing. 2013;112(Jul):139–52.
Vapnik V. Statistical learning theory. New York: Wiley; 1998.
Wang H, Fan W, Yu PS, Han J. Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2003. p. 226–35.
Witten IH, Frank E, Hall MA, editors. Data mining: practical machine learning tools and techniques. 3rd ed. Amsterdam: Morgan Kauffman; 2011.
Yi B-K, Sidiropoulos N, Johnson T, Jagadish HV, Faloutsos C, Biliris A. Online data mining for co-evolving time sequences. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 13–22.
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1996. p. 103–14.
Zhu Y, Shasha D. Statstream: statistical monitoring of thousands of data streams in real-time. In: Proceedings of the 28th International Conference on Very Large Data Bases; 2002. p. 358–69.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Turaga, D. (2018). Streaming Analytics. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80673
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80673
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering