Skip to main content

Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF

  • Conference paper
  • First Online:
Book cover High Performance Computing (ISC High Performance 2018)

Abstract

This article introduces ten different tensor operations, their generalizations, as well as their implementations for a dataflow paradigm. Tensor operations could be utilized for addressing a number of big data problems in machine learning and computer vision, such as speech recognition, visual object recognition, data mining, deep learning, genomics, mind genomics, and applications in civil and geo engineering. As the big data applications are breaking the Exascale barrier, and also the Bronto scale barrier in a not so far future, the main challenge is finding a way to process such big quantities of data.

This article sheds light on various dataflow implementations of tensor operations, mostly those used in machine learning. The iterative nature of tensor operations and a large amount of data makes them situable for the dataflow paradigm. All the dataflow implementations are analyzed comparatively with the related control-flow implementations, for speedup, complexity, power savings, and MTBF. The core contribution of this paper is a table that compare the two paradigms for various data set sizes, and in various conditions of interest.

The results presented in this paper are made to be applicable both for the current dataflow paradigm implementations and for what we believe are the optimal future dataflow paradigm implementations, which we refer to as the Ultimate dataflow. This portability was made possible because the programming model of the current dataflow implementation is applicable also to the Ultimate dataflow. The major differences between the Ultimate dataflow and the current dataflow implementations are not in the programming model, but in the hardware structure and in the capabilities of the optimizing compiler. In order to show the differences between the Ultimate dataflow and the current dataflow implementations, and in order to show what to expect from the future dataflow paradigm implementations, this paper starts with an overview of Ultimate dataflow and its potentials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)

    Google Scholar 

  2. Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)

    MathSciNet  MATH  Google Scholar 

  3. Auth, C., et al.: A 10 nm high performance and low-power CMOS technology featuring 3rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects. In: 2017 IEEE International on Electron Devices Meeting (IEDM), p. 29-1. IEEE (2017)

    Google Scholar 

  4. Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Finan. Stochast. 2(1), 41–68 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  5. Baskin, C., Liss, N., Mendelson, A., Zheltonozhskii, E.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052 (2017)

  6. Brillouin, L.: Tensors in mechanics and elasficify (1964)

    Google Scholar 

  7. Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, p. 16. ACM (1987)

    Google Scholar 

  8. Dobravec, T., Bulic, P.: Comparing CPU and GPU implementations of a simple matrix multiplication algorithm. Int. J. Comput. Electr. Eng. 9(2), 430–439 (2017)

    Article  Google Scholar 

  9. Einstein, A.: Relativity: The Special and The General Theory. Princeton University Press, Princeton (2015)

    MATH  Google Scholar 

  10. Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in MATLAB: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  11. Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM (JACM) 60(6), 45 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Blagojevic, V., et al.: A systematic approach to generation of new ideas for PhD research in computing. In: Advances in Computers, vol. 104, pp. 1–19. Elsevier (2016)

    Google Scholar 

  15. Milutinovic, V., et al.: A new course on R&D project management in computer science and engineering: subjects taught, rationales behind, and lessons learned. In: Advances in Computers, vol. 106, pp. 1–19. Elsevier (2017)

    Google Scholar 

  16. Stojanovic, S., Bojic, D., Bojovic, M.: An overview of selected heterogeneous and reconfigurable architectures. In: Advances in Computers, vol. 96, pp. 1–45. Elsevier (2015)

    Google Scholar 

  17. Janzamin, M., Sedghi, H., Anandkumar, A.: Beating the perils of non-convexity: guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015)

  18. Jennings, A., McKeown, J.J.: Matrix Computation. Wiley, Hoboken (1992)

    MATH  Google Scholar 

  19. Jovanović, Ž., Milutinović, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)

    Article  Google Scholar 

  20. Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004. IEEE (2013)

    Google Scholar 

  21. Kanellos, M.: 152,000 smart devices every minute in 2025: IDC outlines the future of smart things. Forbes.com (2016)

    Google Scholar 

  22. Karatzoglou, A., Amatriain, X., Baltrunas, L., Oliver, N.: Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the Fourth ACM Conference on Recommender systems, pp. 79–86. ACM (2010)

    Google Scholar 

  23. Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. (TOMS) 5(3), 308–323 (1979)

    Article  MATH  Google Scholar 

  24. Meyer, C.D.: Matrix Analysis and Applied Linear Algebra, vol. 71. SIAM, Philadelphia (2000)

    Book  Google Scholar 

  25. Miller, E., Ladenheim, S., Martin, C.: Higher order tensor operations and their applications. TCNJ J. Stud. Sch. 11, 1–15 (2009)

    Google Scholar 

  26. Milutinovic, D., Milutinovic, V., Soucek, B.: The honeycomb architecture (1987)

    Google Scholar 

  27. Milutinovic, V., et al.: Splitting spatial and temporal localities for entropy minimiation. Tutorial of the IEEE ISCA (1995)

    Google Scholar 

  28. Milutinovic, V.: A comparison of suboptimal detection algorithms applied to the additive mix of orthogonal sinusoidal signals. IEEE Trans. Commun. 36(5), 538–543 (1988)

    Article  Google Scholar 

  29. Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66125-4

    Book  Google Scholar 

  30. Milutinović, V., Salom, J., Trifunović, N., Giorgi, R.: Guide to DataFlow Supercomputing. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16229-4

    Book  Google Scholar 

  31. Milutinovic, V., Salom, J., Veljovic, D., Korolija, N., Markovic, D., Petrovic, L.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66128-5

    Book  Google Scholar 

  32. Nurvitadhi, E., et al.: Can FPGAS beat GPUS in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. ACM (2017)

    Google Scholar 

  33. Rupp, K., et al.: Years of microprocessor trend data. http://www.karlrupp.net/wp-content/uploads/2015.06/40-years-processor-trend.png (40)

  34. Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. J. Phys.: Conf. Ser. 78, 012022 (2007)

    Google Scholar 

  35. Trifunovic, N., Milutinovic, V., et al.: The AppGallery.Maxeler.com for bigdata supercomputing. J. Big Data 3(1), 1–9 (2016)

    Article  Google Scholar 

  36. Trifunovic, N., Milutinovic, V., Salom, J., Kos, A.: Paradigm shift in big data SuperComputing: DataFlow vs. ControlFlow. J. Big Data 2(1), 4 (2015)

    Article  Google Scholar 

  37. Trobec, R., Vasiljevic, R., Tomasevic, M., Milutinovic, V., et al.: Interconnection networks for petacomputing. ACM Comput. Surv. 49(1), 1–24 (2016)

    Article  Google Scholar 

  38. Tuffley, D.: Google’s release of TensorFlow could be a gamechanger in the future of AI (2015)

    Google Scholar 

  39. Voss, N., Bacis, M., Mencer, O., Gaydadjiev, G., Luk, W.: Convolutional neural networks on dataflow engines. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 435–438. IEEE (2017)

    Google Scholar 

  40. Wu, K.C., Tsai, Y.W.: Structured ASIC, evolution or revolution? In: Proceedings of the 2004 International Symposium on Physical Design, pp. 103–106. ACM (2004)

    Google Scholar 

  41. Ye, J., Li, Q.: A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 929–941 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milos Kotlar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kotlar, M., Milutinovic, V. (2018). Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-02465-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-02464-2

  • Online ISBN: 978-3-030-02465-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics