Skip to main content

Apache Flink

Synonyms

Stratosphere platform

Overview

Today, virtually all data is continuously generated as streams of events. This includes business transactions, interactions with web or mobile application, sensor or device logs, and database modifications. There are two ways to process continuously produced data, namely batch and stream processing. For stream processing, the data is immediately ingested and processed by a continuously running application as it arrives. For batch processing, the data is first recorded and persisted in a storage system, such as a file system or database system, before it is (periodically) processed by an application that processes a bounded data set. While stream processing typically achieves lower latencies to produce results, it induces operational challenges because streaming applications which run 24 × 7 make high demands on failure recovery and consistency guarantees.

The most fundamental difference between batch and stream processing applications is that...

This is a preview of subscription content, log in via an institution.

References

  • Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-of-order data processing. Proc VLDB Endowment 8(12):1792–1803

    Article  Google Scholar 

  • Alexandrov A, Ewen S, Heimel M, Hueske F, Kao O, Markl V, …, Warneke D (2011) MapReduce and PACT-comparing data parallel programming models. In BTW, pp 25–44

    Google Scholar 

  • Alexandrov A, Bergmann R, Ewen S, Freytag JC, Hueske F, Heise A, …, Naumann F (2014) The stratosphere platform for big data analytics. VLDB J 23(6):939–964

    Article  Google Scholar 

  • Battré D, Ewen S, Hueske F, Kao O, Markl V, Warneke D (2010) Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proceedings of the 1st ACM symposium on cloud computing. ACM, pp 119–130

    Google Scholar 

  • Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S, Tzoumas K (2015a) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4

    Google Scholar 

  • Carbone P et al (2015b) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, no. 4

    Google Scholar 

  • Carbone P et al (2015c) Lightweight asynchronous snapshots for distributed dataflows. In CoRR abs/1506.08603. http://arxiv.org/abs/1506.08603

  • Carbone P, Ewen S, Fóra G, Haridi S, Richter S, Tzoumas K (2017) State management in apache flink®: consistent stateful distributed stream processing. Proc VLDB Endowment 10(12):1718–1729

    Article  Google Scholar 

  • Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  • Ewen S, Tzoumas K, Kaufmann M, Markl V (2012) Spinning fast iterative data flows. Proc VLDB Endowment 5(11):1268–1279

    Article  Google Scholar 

  • Ghemawat S, Gobioff H, Leung ST (2003) The google file system. ACM SIGOPS Oper Syst Rev 37(5):29–43. ACM

    Article  Google Scholar 

  • Hueske F, Peters M, Sax MJ, Rheinländer A, Bergmann R, Krettek A, Tzoumas K (2012) Opening the black boxes in data flow optimization. Proc VLDB Endowment 5(11):1256–1267

    Article  Google Scholar 

  • Koliopoulos A (2017) Drivetribe’s modern take on CQRS with Apache Flink. Drivetribe. https://data-artisans.com/blog/drivetribe-cqrs-apache-flink. Visited on 7 Sept 2017

  • Mani Chandy K, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Trans Comp Syst (TOCS) 3(1):63–75

    Article  Google Scholar 

  • The Apache Software Foundation. RocksDB|A persistent key-value store|RocksDB. http://rocksdb.org/. Visited on 30 Sept 2017

Recommended Reading

  • Friedman E, Tzoumas K (2016) Introduction to Apache Flink: stream processing for real time and beyond. O’Reilly Media, Sebastopol. ISBN 1491976586

    Google Scholar 

  • Hueske F, Kalavri V (2018) Stream processing with Apache Flink: fundamentals, implementation, and operation of streaming applications. O’Reilly Media, Sebastopol. ISBN 149197429X

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabian Hueske .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Hueske, F., Walther, T. (2018). Apache Flink. In: Sakr, S., Zomaya, A. (eds) Encyclopedia of Big Data Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-63962-8_303-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63962-8_303-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63962-8

  • Online ISBN: 978-3-319-63962-8

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics

Chapter history

  1. Latest

    Apache Flink
    Published:
    17 May 2022

    DOI: https://doi.org/10.1007/978-3-319-63962-8_303-2

  2. Original

    Apache Flink
    Published:
    24 April 2018

    DOI: https://doi.org/10.1007/978-3-319-63962-8_303-1