Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF

  • Milos KotlarEmail author
  • Veljko Milutinovic
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


This article introduces ten different tensor operations, their generalizations, as well as their implementations for a dataflow paradigm. Tensor operations could be utilized for addressing a number of big data problems in machine learning and computer vision, such as speech recognition, visual object recognition, data mining, deep learning, genomics, mind genomics, and applications in civil and geo engineering. As the big data applications are breaking the Exascale barrier, and also the Bronto scale barrier in a not so far future, the main challenge is finding a way to process such big quantities of data.

This article sheds light on various dataflow implementations of tensor operations, mostly those used in machine learning. The iterative nature of tensor operations and a large amount of data makes them situable for the dataflow paradigm. All the dataflow implementations are analyzed comparatively with the related control-flow implementations, for speedup, complexity, power savings, and MTBF. The core contribution of this paper is a table that compare the two paradigms for various data set sizes, and in various conditions of interest.

The results presented in this paper are made to be applicable both for the current dataflow paradigm implementations and for what we believe are the optimal future dataflow paradigm implementations, which we refer to as the Ultimate dataflow. This portability was made possible because the programming model of the current dataflow implementation is applicable also to the Ultimate dataflow. The major differences between the Ultimate dataflow and the current dataflow implementations are not in the programming model, but in the hardware structure and in the capabilities of the optimizing compiler. In order to show the differences between the Ultimate dataflow and the current dataflow implementations, and in order to show what to expect from the future dataflow paradigm implementations, this paper starts with an overview of Ultimate dataflow and its potentials.


Big data Dataflow computing Tensor calculus Ultimate dataflow 


  1. 1.
    Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)Google Scholar
  2. 2.
    Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Auth, C., et al.: A 10 nm high performance and low-power CMOS technology featuring 3rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects. In: 2017 IEEE International on Electron Devices Meeting (IEDM), p. 29-1. IEEE (2017)Google Scholar
  4. 4.
    Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Finan. Stochast. 2(1), 41–68 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Baskin, C., Liss, N., Mendelson, A., Zheltonozhskii, E.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052 (2017)
  6. 6.
    Brillouin, L.: Tensors in mechanics and elasficify (1964)Google Scholar
  7. 7.
    Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, p. 16. ACM (1987)Google Scholar
  8. 8.
    Dobravec, T., Bulic, P.: Comparing CPU and GPU implementations of a simple matrix multiplication algorithm. Int. J. Comput. Electr. Eng. 9(2), 430–439 (2017)CrossRefGoogle Scholar
  9. 9.
    Einstein, A.: Relativity: The Special and The General Theory. Princeton University Press, Princeton (2015)zbMATHGoogle Scholar
  10. 10.
    Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in MATLAB: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM (JACM) 60(6), 45 (2013)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Blagojevic, V., et al.: A systematic approach to generation of new ideas for PhD research in computing. In: Advances in Computers, vol. 104, pp. 1–19. Elsevier (2016)Google Scholar
  15. 15.
    Milutinovic, V., et al.: A new course on R&D project management in computer science and engineering: subjects taught, rationales behind, and lessons learned. In: Advances in Computers, vol. 106, pp. 1–19. Elsevier (2017)Google Scholar
  16. 16.
    Stojanovic, S., Bojic, D., Bojovic, M.: An overview of selected heterogeneous and reconfigurable architectures. In: Advances in Computers, vol. 96, pp. 1–45. Elsevier (2015)Google Scholar
  17. 17.
    Janzamin, M., Sedghi, H., Anandkumar, A.: Beating the perils of non-convexity: guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015)
  18. 18.
    Jennings, A., McKeown, J.J.: Matrix Computation. Wiley, Hoboken (1992)zbMATHGoogle Scholar
  19. 19.
    Jovanović, Ž., Milutinović, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)CrossRefGoogle Scholar
  20. 20.
    Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004. IEEE (2013)Google Scholar
  21. 21.
    Kanellos, M.: 152,000 smart devices every minute in 2025: IDC outlines the future of smart things. (2016)Google Scholar
  22. 22.
    Karatzoglou, A., Amatriain, X., Baltrunas, L., Oliver, N.: Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the Fourth ACM Conference on Recommender systems, pp. 79–86. ACM (2010)Google Scholar
  23. 23.
    Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. (TOMS) 5(3), 308–323 (1979)zbMATHCrossRefGoogle Scholar
  24. 24.
    Meyer, C.D.: Matrix Analysis and Applied Linear Algebra, vol. 71. SIAM, Philadelphia (2000)CrossRefGoogle Scholar
  25. 25.
    Miller, E., Ladenheim, S., Martin, C.: Higher order tensor operations and their applications. TCNJ J. Stud. Sch. 11, 1–15 (2009)Google Scholar
  26. 26.
    Milutinovic, D., Milutinovic, V., Soucek, B.: The honeycomb architecture (1987)Google Scholar
  27. 27.
    Milutinovic, V., et al.: Splitting spatial and temporal localities for entropy minimiation. Tutorial of the IEEE ISCA (1995)Google Scholar
  28. 28.
    Milutinovic, V.: A comparison of suboptimal detection algorithms applied to the additive mix of orthogonal sinusoidal signals. IEEE Trans. Commun. 36(5), 538–543 (1988)CrossRefGoogle Scholar
  29. 29.
    Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z.: DataFlow Supercomputing Essentials. Springer, Cham (2017). Scholar
  30. 30.
    Milutinović, V., Salom, J., Trifunović, N., Giorgi, R.: Guide to DataFlow Supercomputing. Springer, Cham (2015). Scholar
  31. 31.
    Milutinovic, V., Salom, J., Veljovic, D., Korolija, N., Markovic, D., Petrovic, L.: DataFlow Supercomputing Essentials. Springer, Cham (2017). Scholar
  32. 32.
    Nurvitadhi, E., et al.: Can FPGAS beat GPUS in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. ACM (2017)Google Scholar
  33. 33.
    Rupp, K., et al.: Years of microprocessor trend data. (40)
  34. 34.
    Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. J. Phys.: Conf. Ser. 78, 012022 (2007)Google Scholar
  35. 35.
    Trifunovic, N., Milutinovic, V., et al.: The for bigdata supercomputing. J. Big Data 3(1), 1–9 (2016)CrossRefGoogle Scholar
  36. 36.
    Trifunovic, N., Milutinovic, V., Salom, J., Kos, A.: Paradigm shift in big data SuperComputing: DataFlow vs. ControlFlow. J. Big Data 2(1), 4 (2015)CrossRefGoogle Scholar
  37. 37.
    Trobec, R., Vasiljevic, R., Tomasevic, M., Milutinovic, V., et al.: Interconnection networks for petacomputing. ACM Comput. Surv. 49(1), 1–24 (2016)CrossRefGoogle Scholar
  38. 38.
    Tuffley, D.: Google’s release of TensorFlow could be a gamechanger in the future of AI (2015)Google Scholar
  39. 39.
    Voss, N., Bacis, M., Mencer, O., Gaydadjiev, G., Luk, W.: Convolutional neural networks on dataflow engines. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 435–438. IEEE (2017)Google Scholar
  40. 40.
    Wu, K.C., Tsai, Y.W.: Structured ASIC, evolution or revolution? In: Proceedings of the 2004 International Symposium on Physical Design, pp. 103–106. ACM (2004)Google Scholar
  41. 41.
    Ye, J., Li, Q.: A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 929–941 (2005)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Electrical EngineeringUniversity of BelgradeBelgradeSerbia
  2. 2.Fellow of the IEEEWashington DCUSA
  3. 3.Academia EuropaeaLondonUK
  4. 4.Department of Computer ScienceUniversity of IndianaBloomingtonUSA
  5. 5.Mathematical Institute of the Serbian Academy of Arts and SciencesBelgradeSerbia

Personalised recommendations