Advertisement

DataFlow Systems: From Their Origins to Future Applications in Data Analytics, Deep Learning, and the Internet of Things

Chapter
Part of the Computer Communications and Networks book series (CCN)

Abstract

With the slowdowns in Dennard scaling and limited performance gain in multi-core scaling, we are witnesses of the high-performance computing shift to domain-specific hardware systems which empower big data and high-performance applications. Likewise, dataflow systems are experiencing a revival with both hardware and software approaches widely exploited. In our work, we give an overview of dataflow system origins and similar technologies such as systolic architecture whose principles are applied by some of today’s leading high-performance systems such as Multiscale dataflow Computing (MDC). In the second part, we highlight certain applications that could benefit from delegating critical processing to a MDC system. We emphasize algorithms and applications from data analytics, deep learning, and the Internet of Things (IoT), with a special focus on their execution within the cloud environment. We discuss the integration of software distributed dataflow systems such as Apache Spark with MDC systems, analyze design issues and challenges for implementation of deep neural networks using MDC, and how semantic-enabled IoT platforms and services could be improved by using MDC systems in order to become more effective. We expect that these selected case studies would motivate researchers to investigate engagement of hardware dataflow systems to support applications from other areas with similarly rigid requirements.

Keywords

Dataflow Systems Multi-core Scaling Apache Spark Dataflow Engine (DFE) Resilient Distributed Datasets (RDD) 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgements

This work was supported by the Serbian Ministry of Education and Science under Project III44006.

References

  1. 1.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  2. 2.
    Moore GE (1965) Cramming more components onto integrated circuits. Electronics 38(8):114–117Google Scholar
  3. 3.
    Dennard RH, Gaensslen FH, Rideout VL, Bassous E, LeBlanc AR (1974) Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J Solid-State Circuits 9(5):256–268CrossRefGoogle Scholar
  4. 4.
    Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th annual international symposium on Computer architecture (ISCA 11), San JoseGoogle Scholar
  5. 5.
    Esmaeilzadeh H, Blem E, Amant RS, Sankaralingam K, Burger D (2013) Power challenges may end the multicore era. Commun ACM 56(2):93–102CrossRefGoogle Scholar
  6. 6.
    Coussy P, Gajski DD, Meredith M, Takach A (2009) An introduction to high-level synthesis. IEEE Des Test Comput 26(4):8–17CrossRefGoogle Scholar
  7. 7.
    Kuon I, Tessier R, Rose J (2008) FPGA architecture: survey and challenges. Foundations and trends in electronic design automation, vol 2(2). Now Publishers, Hanover, pp 135–253Google Scholar
  8. 8.
    Arvind, Culler DE (1986) Dataflow architectures. Ann Rev Comput Sci 1:225–253CrossRefGoogle Scholar
  9. 9.
    Veen AH (1986) Dataflow machine architecture. ACM Comput Surv 18(4):365–396CrossRefGoogle Scholar
  10. 10.
    Lee B, Hurson AR (1993) Issues in dataflow computing. Adv Comput 37:285–333CrossRefGoogle Scholar
  11. 11.
    Johnston WM, Hanna JRP, Millar RJ (2004) Advances in dataflow programming languages. ACM Comput Surv 36(1):1–34CrossRefGoogle Scholar
  12. 12.
    Kung HT (1982) Why systolic architectures? IEEE Comput 15(1):37–46CrossRefGoogle Scholar
  13. 13.
    Kung HT (1979) Let’s design algorithms for VLSI systems. In: Proceedings of the Caltech conference on very large scale integration, California Institute of Technology, Pasadena, pp 65–90Google Scholar
  14. 14.
    Pell O, Averbukh V (2012) Maximum performance computing with dataflow engines. IEEE Comput Sci Eng 14(4):98–103CrossRefGoogle Scholar
  15. 15.
    Trifunovic N, Perovic B, Trifunovic P, Babovic Z, Hurson A (2017) A novel infrastructure for synergistic dataflow research, development, education, and deployment: the Maxeler AppGallery project. Adv Comput 106:167–213CrossRefGoogle Scholar
  16. 16.
    Dean J, Ghemawat S (2008) MapReduce: a flexible data processing tool. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  17. 17.
    Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12), San Jose, 25–27 Apr 2012Google Scholar
  18. 18.
    Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ, Ghodsi A, Gonzalez J, Shenker S, Stoica I (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRefGoogle Scholar
  19. 19.
    Chambers C, Raniwala A, Perry F, Adams S, Henry RR, Bradshaw R, Weizenbaum N (2010) FlumeJava: easy, efficient data-parallel pipelines. In: Proceedings of the 31st ACM SIGPLAN conference on programming language design and implementation (PLDI 10), TorontoGoogle Scholar
  20. 20.
    Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: ACM SIGOPS operating systems review – EuroSys’07 conference proceedings, June 2007, vol 41, number 3, pp 59–72Google Scholar
  21. 21.
    Akidau T et al (2015) The dataflow model: a practical approach to balancing correctness, latency, and cost in massiveScale, unbounded, out of Order data processing. Proc VLDB Endow 8(12):1792–1803CrossRefGoogle Scholar
  22. 22.
    Carbone P, Katsifodimos A, Ewen S, Markl V, Haridi S et al (2015) Apache Flink: stream and batch processing in a single engine. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol 36, number 4, pp 28–38Google Scholar
  23. 23.
    Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the twenty-fourth ACM symposium on operating systems principles (SOSP’13), Farminton, 03–06 Nov 2013, pp 439–455Google Scholar
  24. 24.
    Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265–283Google Scholar
  25. 25.
    Palkar S, Thomas JJ, Shanbhag A, Narayanan D, Pirk H, Schwarzkopf M, Amarasinghe S, Zaharia M, (2017) Weld: a common runtime for high performance data analytics. In: 8th Biennial conference on innovative data systems research (CIDR 17), ChaminadeGoogle Scholar
  26. 26.
    Dean J (2016) Large-scale deep learning with TensorFlow for building intelligent systems. ACM WebinarGoogle Scholar
  27. 27.
    Omerovic S, Babovic Z, Tafa Z, Milutinovic V, Tomazic S (2011)Concept modeling: from origins to Multimedia. Multimed Tools Appl 51(3):1175–1200CrossRefGoogle Scholar
  28. 28.
    Jouppi NP et al (2017) In-Datacenter performance analysis of a tensor processing unit. In: 44th international symposium on computer architecture (ISCA 17), TorontoGoogle Scholar
  29. 29.
    Omondi AR, Rajapakse JC (eds) (2006) FPGA implementations of neural networks. Springer, DordrechtGoogle Scholar
  30. 30.
    Farabet C, LeCun Y, Kavukcuoglu K, Culurciello E, Martini B, Akselrod P, Talay S (2011) Large-scale FPGA-based convolutional networks. In: Scaling up machine learning: parallel and distributed approaches. Cambridge University Press, Cambridge, pp 399–419CrossRefGoogle Scholar
  31. 31.
    Babovic Z, Milutinovic V (2013) Novel system architectures for semantic-based integration of sensor networks. Adv Comput 90:91–183CrossRefGoogle Scholar
  32. 32.
    Babovic Z, Protic J, Milutinovic V (2016) Web performance evaluation for internet of things applications. IEEE Access 4:6974–6992CrossRefGoogle Scholar
  33. 33.
    Yongrui Qin, Sheng QZ, Curry E (2015) Matching over linked data streams in the internet of things. IEEE Internet Comput 19(3):21–27CrossRefGoogle Scholar
  34. 34.
    Babovic Z (2014) RDF data management. Technical report, School of the Electrical Engineering, University of BelgradeGoogle Scholar
  35. 35.
    Kaoudi Z, Manolescu I (2015) RDF in the clouds: a survey. Int J Very Large Data Bases 24(1):67–91CrossRefGoogle Scholar
  36. 36.
    Blagojevic V et al (2016) A systematic approach to generation of new ideas for PhD research in computing, vol 104. Advances in computers. Elsevier, pp 1–19Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Electrical EngineeringUniversity of BelgradeBelgradeSerbia
  2. 2.University of BernBern/Fribourg/NeuchâtelSwitzerland
  3. 3.Maxeler TechnologiesLondonUK

Personalised recommendations