Big Stream Systems
Continuous workflow execution frameworks; Distributed stream processing
Big stream systems aim to bring the scalability of batch processing frameworks to stream applications. Stream processing systems have different constraints than batch processing systems as well as a different set of challenges. The unbounded and potentially high-volume nature of streams require stream applications to execute continuously and to limit the role of disk-based storage. The throughput of high-volume streams can exceed the throughput of disks, and the stream data may not have any lasting value beyond the meaning that can be extracted from them. Big stream systems address the challenge of achieving high scalability in stream processing by (1) keeping data moving and off of disks, (2) implementing fault-tolerant strategies to allow stream data to persist in the event of faults, and (3) spreading computational workloads across many nodes while preserving the integrity and order of the...
- 1.Abadi D, Ahmad Y, Balazinska M, Çetintemel U, Cherniack M, Hwang J, Lindner W, Maskey A, Rasin A, Ryvkina E, Tatbul N, Xing Y, Zdonik S. The design of the borealis stream processing engine. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research; 2005. p. 277–89.Google Scholar
- 2.Apache Hadoop. The Apache Software Foundation. 2014. http://hadoop.apache.org. Accessed 1 June 2014.
- 3.Apache Storm. The Apache Software Foundation. 2014. http://storm.incubator.apache.org. Accessed 1 June 2014.
- 4.Condie T, Conway N, Alvaro P, Hellerstein J, Elmeleegy K, Sears R. MapReduce Online. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design & Implementation; 2010.Google Scholar
- 5.Dean J, Ghemawat S. MapReduce: simplified data processing on large cluster. In: Proceedings of the 6th USENIX Symp. on Operating System Design and Implementation; 2004.Google Scholar
- 6.Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshops; 2010.Google Scholar
- 7.Zaharia M, Das T, Li H, Hunter T, Shenker S, Stoica, I. Discretized streams: a fault-tolerant model for scalable stream processing. In: Proceedings of the 24th ACM Symposium on Operating System Principles; 2013.Google Scholar