Skip to main content

DataFlow Systems: From Their Origins to Future Applications in Data Analytics, Deep Learning, and the Internet of Things

  • Chapter
  • First Online:
DataFlow Supercomputing Essentials

Abstract

With the slowdowns in Dennard scaling and limited performance gain in multi-core scaling, we are witnesses of the high-performance computing shift to domain-specific hardware systems which empower big data and high-performance applications. Likewise, dataflow systems are experiencing a revival with both hardware and software approaches widely exploited. In our work, we give an overview of dataflow system origins and similar technologies such as systolic architecture whose principles are applied by some of today’s leading high-performance systems such as Multiscale dataflow Computing (MDC). In the second part, we highlight certain applications that could benefit from delegating critical processing to a MDC system. We emphasize algorithms and applications from data analytics, deep learning, and the Internet of Things (IoT), with a special focus on their execution within the cloud environment. We discuss the integration of software distributed dataflow systems such as Apache Spark with MDC systems, analyze design issues and challenges for implementation of deep neural networks using MDC, and how semantic-enabled IoT platforms and services could be improved by using MDC systems in order to become more effective. We expect that these selected case studies would motivate researchers to investigate engagement of hardware dataflow systems to support applications from other areas with similarly rigid requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Article  Google Scholar 

  2. Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117

    Google Scholar 

  3. Dennard RH, Gaensslen FH, Rideout VL, Bassous E, LeBlanc AR (1974) Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J Solid-State Circuits 9(5):256–268

    Article  Google Scholar 

  4. Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th annual international symposium on Computer architecture (ISCA 11), San Jose

    Google Scholar 

  5. Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2013) Power challenges may end the multicore era. Commun ACM 56(2):93–102

    Article  Google Scholar 

  6. Coussy P, Gajski DD, Meredith M, Takach A (2009) An introduction to high-level synthesis. IEEE Des Test Comput 26(4):8–17

    Article  Google Scholar 

  7. Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Foundations and trends in electronic design automation, vol 2(2). Now Publishers, Hanover, pp 135–253

    Google Scholar 

  8. Arvind, Culler DE (1986) Dataflow architectures. Ann Rev Comput Sci 1:225–253

    Article  Google Scholar 

  9. Veen AH (1986) Dataflow machine architecture. ACM Comput Surv 18(4):365–396

    Article  Google Scholar 

  10. Lee B, Hurson AR (1993) Issues in dataflow computing. Adv Comput 37:285–333

    Article  Google Scholar 

  11. Johnston WM, Hanna JRP, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1–34

    Article  Google Scholar 

  12. Kung HT (1982) Why systolic architectures? IEEE Comput 15(1):37–46

    Article  Google Scholar 

  13. Kung HT (1979) Let’s design algorithms for VLSI systems. In: Proceedings of the Caltech conference on very large scale integration, California Institute of Technology, Pasadena, pp 65–90

    Google Scholar 

  14. Pell O, Averbukh V (2012) Maximum performance computing with dataflow engines. IEEE Comput Sci Eng 14(4):98–103

    Article  Google Scholar 

  15. Trifunovic N, Perovic B, Trifunovic P, Babovic Z, Hurson A (2017) A novel infrastructure for synergistic dataflow research, development, education, and deployment: the Maxeler AppGallery project. Adv Comput 106:167–213

    Article  Google Scholar 

  16. Dean J, Ghemawat S (2008) MapReduce: a flexible data processing tool. Commun ACM 51(1):107–113

    Article  Google Scholar 

  17. Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12), San Jose, 25–27 Apr 2012

    Google Scholar 

  18. Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  19. Chambers C, Raniwala A, Perry F, Adams S, Henry RR, Bradshaw R, Weizenbaum N (2010) FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation (PLDI 10), Toronto

    Google Scholar 

  20. Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review – EuroSys’07 conference proceedings, June 2007, vol 41, number 3, pp 59–72

    Google Scholar 

  21. Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massiveScale, unbounded, out of Order data processing. Proc VLDB Endow 8(12):1792–1803

    Article  Google Scholar 

  22. Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S et al (2015) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, number 4, pp 28–38

    Google Scholar 

  23. Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles (SOSP’13), Farminton, 03–06 Nov 2013, pp 439–455

    Google Scholar 

  24. Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283

    Google Scholar 

  25. Palkar S, Thomas JJ, Shanbhag A, Narayanan D, Pirk H, Schwarzkopf M, Amarasinghe S, Zaharia M, (2017) Weld: a common runtime for high performance data analytics. In: 8th Biennial conference on innovative data systems research (CIDR 17), Chaminade

    Google Scholar 

  26. Dean J (2016) Large-scale deep learning with TensorFlow for building intelligent systems. ACM Webinar

    Google Scholar 

  27. Omerovic S, Babovic Z, Tafa Z, Milutinovic V, Tomazic S (2011)Concept modeling: from origins to Multimedia. Multimed Tools Appl 51(3):1175–1200

    Article  Google Scholar 

  28. Jouppi NP et al (2017) In-Datacenter performance analysis of a tensor processing unit. In: 44th international symposium on computer architecture (ISCA 17), Toronto

    Google Scholar 

  29. Omondi AR, Rajapakse JC (eds) (2006) FPGA implementations of neural networks. Springer, Dordrecht

    Google Scholar 

  30. Farabet C, LeCun Y, Kavukcuoglu K, Culurciello E, Martini B, Akselrod P, Talay S (2011) Large-scale FPGA-based convolutional networks. In: Scaling up machine learning: parallel and distributed approaches. Cambridge University Press, Cambridge, pp 399–419

    Chapter  Google Scholar 

  31. Babovic Z, Milutinovic V (2013) Novel system architectures for semantic-based integration of sensor networks. Adv Comput 90:91–183

    Article  Google Scholar 

  32. Babovic Z, Protic J, Milutinovic V (2016) Web performance evaluation for internet of things applications. IEEE Access 4:6974–6992

    Article  Google Scholar 

  33. Yongrui Qin, Sheng QZ, Curry E (2015) Matching over linked data streams in the internet of things. IEEE Internet Comput 19(3):21–27

    Article  Google Scholar 

  34. Babovic Z (2014) RDF data management. Technical report, School of the Electrical Engineering, University of Belgrade

    Google Scholar 

  35. Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. Int J Very Large Data Bases 24(1):67–91

    Article  Google Scholar 

  36. Blagojevic V et al (2016) A systematic approach to generation of new ideas for PhD research in computing, vol 104. Advances in computers. Elsevier, pp 1–19

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Serbian Ministry of Education and Science under Project III44006.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z. (2017). DataFlow Systems: From Their Origins to Future Applications in Data Analytics, Deep Learning, and the Internet of Things. In: DataFlow Supercomputing Essentials. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-66125-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66125-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66124-7

  • Online ISBN: 978-3-319-66125-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics