Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Stream Window Aggregation Semantics and Optimization

  • Paris Carbone
  • Asterios Katsifodimos
  • Seif Haridi
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_154-1

Definition

Sliding windows are bounded sets which evolve together with an infinite data stream of records. Each new sliding window evicts records from the previous one while introducing newly arrived records as well. Aggregations on windows typically derive some metric such as an average or a sum of a value in each window. The main challenge of applying aggregations to sliding windows is that a naive execution can lead to a high degree of redundant computation due to a large number of common records across different windows. Special optimization techniques have been developed throughout the years to tackle redundancy and make sliding window aggregation feasible and more efficient in large data streams.

Overview

Data stream processing has evolved significantly throughout the years, both in terms of system support and in programming model primitives. Alongside adopting common data-centric operators from relational algebra and functional programming such as select, join, flatmap, reduce,...

This is a preview of subscription content, log in to check access.

References

  1. Akidau T, Bradshaw R, Chambers C, Chernyak S, Fernández-Moctezuma RJ, Lax R, McVeety S, Mills D, Perry F, Schmidt E et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. In: VLDBGoogle Scholar
  2. Arasu A, Widom J (2004) Resource sharing in continuous sliding-window aggregates. In: VLDBGoogle Scholar
  3. Arasu A, Babcock B, Babu S, Cieslewicz J, Datar M, Ito K, Motwani R, Srivastava U, Widom J (2016) Stream:the Stanford data stream management system. In: Data stream management. Springer, Berlin/Heidelberg, pp 317–336Google Scholar
  4. Arasu A, Babu S, Widom J (2006) The CQL continuous query language: semantic foundations and query execution. In: VLDBJGoogle Scholar
  5. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: SDM. SIAMGoogle Scholar
  6. Botan I, Derakhshan R, Dindar N, Haas L, Miller RJ, Tatbul N (2010) Secret: a model for analysis of the execution semantics of stream processing systems. In: VLDBGoogle Scholar
  7. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015) Apache Flink: stream and batch processing in a single engine. Bull IEEE Comput Soc Tech Commun Data Eng 36(4):28–38Google Scholar
  8. Carbone P, Traub J, Katsifodimos A, Haridi S, Markl V (2016) Cutty: aggregate sharing for user-defined windows. In: Proceedings of the 25th ACM international on conference on information and knowledge management. ACMGoogle Scholar
  9. Carbone P, Ewen S, Fóra G, Haridi S, Richter S, Tzoumas K (2017) State management in Apache Flink®: consistent stateful distributed stream processing. Proc VLDB Endow 10(12):1718–1729Google Scholar
  10. Chandrasekaran S, Cooper O, Deshpande A, Franklin MJ, Hellerstein JM, Hong W, Krishnamurthy S, Madden SR, Reiss F, Shah MA (2003) TelegraphCQ: continuous dataflow processing. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data. ACM, pp 668–668Google Scholar
  11. Guirguis S, Sharaf MA, Chrysanthis PK, Labrinidis A (2012) Three-level processing of multiple aggregate continuous queries. In: IEEE ICDEGoogle Scholar
  12. Hirzel M, Andrade H, Gedik B, Kumar V, Losa G, Nasgaard M, Soule R, Wu K (2009) SPL stream processing language specification. NewYork: IBMResearchDivisionTJ WatsonResearchCenter, IBM ResearchReport: RC24897 (W0911–044)Google Scholar
  13. Hirzel M, Soulé R, Schneider S, Gedik B, Grimm R (2014) A catalog of stream processing optimizations. ACM Comput Surv (CSUR) 46(4):46Google Scholar
  14. Krishnamurthy S, Wu C, Franklin M (2006) On-the-fly sharing for streamed aggregation. In: AMC SIGMODGoogle Scholar
  15. Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005a) No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rec 34:39–44Google Scholar
  16. Li J, Maier D, Tufte K, Papadimos V, Tucker PA (2005b) Semantics and evaluation techniques for window aggregates in data streams. In: ACM SIGMODGoogle Scholar
  17. Li J, Tufte K, Maier D, Papadimos V (2008a) Adaptwid: an adaptive, memory-efficient window aggregation implementation. IEEE Internet Comput 12:22–29Google Scholar
  18. Li J, Tufte K, Shkapenyuk V, Papadimos V, Johnson T, Maier D (2008b) Out-of-order processing: a new architecture for high-performance stream systems. Proc VLDB Endow 1(1):274–288Google Scholar
  19. Tangwongsan K, Hirzel M, Schneider S, Wu KL (2015) General incremental sliding-window aggregation. In: VLDBGoogle Scholar
  20. Tangwongsan K, Hirzel M, Schneider S (2017) Low-latency sliding-window aggregation in worst-case constant time. In: Proceedings of the 11th ACM international conference on distributed and event-based systems. ACM, pp 66–77Google Scholar
  21. Traub J, Grulich P, Rodriguez Cuellar A, Bress S, Katsifodimos A, Rable T, Markl V (2018) Scotty: efficient window aggregation for out-of-order stream processing. In: 2012 IEEE 34th international conference on data Engineering (ICDE). IEEEGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Paris Carbone
    • 1
  • Asterios Katsifodimos
    • 2
  • Seif Haridi
    • 1
  1. 1.KTH Royal Institute of TechnologyStockholmSweden
  2. 2.TU DelftDelftNetherlands

Section editors and affiliations

  • Asterios Katsifodimos
    • 2
  • Pramod Bhatotia
    • 1
  1. 1.Delft University of TechnologyDelftNetherlands
  2. 2.School of InformaticsUniversity of EdinburghEdinburghUK