Cost of Fault-Tolerance on Data Stream Processing
Data streaming engines process data on the fly in contrast to databases that first, store the data and then, they process it. In order to process the increasing amount of data produced every day, data streaming engines run on top of a distributed system. In this setting failures will likely happen. Current distributed data streaming engines like Apache Flink provide fault tolerance. In this paper we evaluate the impact on performance of fault tolerance mechanisms of Flink during regular operation (when there are no failures) on a distributed system and the impact on performance when there are failures. We use the Intel HiBench for conducting the evaluation.
KeywordsData streaming Fault tolerance Evaluation HiBench
This research has been partially funded by the European Commission under projects CloudDBAppliance, CrowdHealth and BigDataStack (grants H2020-732051, H2020-727560 and H2020-779747), the Madrid Regional Council, FSE and FEDER, project Cloud4BigData (grant S2013TIC2894), the Ministry of Economy and Competitiveness (MINECO) under project CloudDB (grant TIN2016-80350).
- 1.Apache flink. https://flink.apache.org/. Accessed 11 May 2018
- 2.Apache hadoop. http://hadoop.apache.org/. Accessed 11 May 2018
- 3.Apache kafka. https://kafka.apache.org/. Accessed 11 May 2018
- 5.Flink an overview of end-to-end exactly-once processing in apache flink. https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html. Accessed 11 May 2018
- 6.Flink checkpointing. https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/checkpointing.html. Accessed 11 May 2018
- 7.Flink runtime. https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/runtime.html. Accessed 11 May 2018
- 9.Hibench, a big data benchmark suite. https://github.com/intel-hadoop/HiBench. Accessed 11 May 2018
- 10.Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 22nd International Conference on Data Engineering Workshops, pp. 41–51 (2010). https://doi.org/10.1109/icdew.2010.5452747
- 12.Rabbitmq. https://www.rabbitmq.com/. Accessed 11 May 2018