Abstract
With the slowdowns in Dennard scaling and limited performance gain in multi-core scaling, we are witnesses of the high-performance computing shift to domain-specific hardware systems which empower big data and high-performance applications. Likewise, dataflow systems are experiencing a revival with both hardware and software approaches widely exploited. In our work, we give an overview of dataflow system origins and similar technologies such as systolic architecture whose principles are applied by some of today’s leading high-performance systems such as Multiscale dataflow Computing (MDC). In the second part, we highlight certain applications that could benefit from delegating critical processing to a MDC system. We emphasize algorithms and applications from data analytics, deep learning, and the Internet of Things (IoT), with a special focus on their execution within the cloud environment. We discuss the integration of software distributed dataflow systems such as Apache Spark with MDC systems, analyze design issues and challenges for implementation of deep neural networks using MDC, and how semantic-enabled IoT platforms and services could be improved by using MDC systems in order to become more effective. We expect that these selected case studies would motivate researchers to investigate engagement of hardware dataflow systems to support applications from other areas with similarly rigid requirements.
Keywords
- Dataflow Systems
- Multi-core Scaling
- Apache Spark
- Dataflow Engine (DFE)
- Resilient Distributed Datasets (RDD)
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444
Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117
Dennard RH, Gaensslen FH, Rideout VL, Bassous E, LeBlanc AR (1974) Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J Solid-State Circuits 9(5):256–268
Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th annual international symposium on Computer architecture (ISCA 11), San Jose
Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2013) Power challenges may end the multicore era. Commun ACM 56(2):93–102
Coussy P, Gajski DD, Meredith M, Takach A (2009) An introduction to high-level synthesis. IEEE Des Test Comput 26(4):8–17
Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Foundations and trends in electronic design automation, vol 2(2). Now Publishers, Hanover, pp 135–253
Arvind, Culler DE (1986) Dataflow architectures. Ann Rev Comput Sci 1:225–253
Veen AH (1986) Dataflow machine architecture. ACM Comput Surv 18(4):365–396
Lee B, Hurson AR (1993) Issues in dataflow computing. Adv Comput 37:285–333
Johnston WM, Hanna JRP, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1–34
Kung HT (1982) Why systolic architectures? IEEE Comput 15(1):37–46
Kung HT (1979) Let’s design algorithms for VLSI systems. In: Proceedings of the Caltech conference on very large scale integration, California Institute of Technology, Pasadena, pp 65–90
Pell O, Averbukh V (2012) Maximum performance computing with dataflow engines. IEEE Comput Sci Eng 14(4):98–103
Trifunovic N, Perovic B, Trifunovic P, Babovic Z, Hurson A (2017) A novel infrastructure for synergistic dataflow research, development, education, and deployment: the Maxeler AppGallery project. Adv Comput 106:167–213
Dean J, Ghemawat S (2008) MapReduce: a flexible data processing tool. Commun ACM 51(1):107–113
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12), San Jose, 25–27 Apr 2012
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Chambers C, Raniwala A, Perry F, Adams S, Henry RR, Bradshaw R, Weizenbaum N (2010) FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation (PLDI 10), Toronto
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review – EuroSys’07 conference proceedings, June 2007, vol 41, number 3, pp 59–72
Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massiveScale, unbounded, out of Order data processing. Proc VLDB Endow 8(12):1792–1803
Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S et al (2015) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, number 4, pp 28–38
Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles (SOSP’13), Farminton, 03–06 Nov 2013, pp 439–455
Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283
Palkar S, Thomas JJ, Shanbhag A, Narayanan D, Pirk H, Schwarzkopf M, Amarasinghe S, Zaharia M, (2017) Weld: a common runtime for high performance data analytics. In: 8th Biennial conference on innovative data systems research (CIDR 17), Chaminade
Dean J (2016) Large-scale deep learning with TensorFlow for building intelligent systems. ACM Webinar
Omerovic S, Babovic Z, Tafa Z, Milutinovic V, Tomazic S (2011)Concept modeling: from origins to Multimedia. Multimed Tools Appl 51(3):1175–1200
Jouppi NP et al (2017) In-Datacenter performance analysis of a tensor processing unit. In: 44th international symposium on computer architecture (ISCA 17), Toronto
Omondi AR, Rajapakse JC (eds) (2006) FPGA implementations of neural networks. Springer, Dordrecht
Farabet C, LeCun Y, Kavukcuoglu K, Culurciello E, Martini B, Akselrod P, Talay S (2011) Large-scale FPGA-based convolutional networks. In: Scaling up machine learning: parallel and distributed approaches. Cambridge University Press, Cambridge, pp 399–419
Babovic Z, Milutinovic V (2013) Novel system architectures for semantic-based integration of sensor networks. Adv Comput 90:91–183
Babovic Z, Protic J, Milutinovic V (2016) Web performance evaluation for internet of things applications. IEEE Access 4:6974–6992
Yongrui Qin, Sheng QZ, Curry E (2015) Matching over linked data streams in the internet of things. IEEE Internet Comput 19(3):21–27
Babovic Z (2014) RDF data management. Technical report, School of the Electrical Engineering, University of Belgrade
Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. Int J Very Large Data Bases 24(1):67–91
Blagojevic V et al (2016) A systematic approach to generation of new ideas for PhD research in computing, vol 104. Advances in computers. Elsevier, pp 1–19
Acknowledgements
This work was supported by the Serbian Ministry of Education and Science under Project III44006.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z. (2017). DataFlow Systems: From Their Origins to Future Applications in Data Analytics, Deep Learning, and the Internet of Things. In: DataFlow Supercomputing Essentials. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-66125-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-66125-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66124-7
Online ISBN: 978-3-319-66125-4
eBook Packages: Computer ScienceComputer Science (R0)