Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF

Kotlar, Milos; Milutinovic, Veljko

doi:10.1007/978-3-030-02465-9_22

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11203))

Included in the following conference series:

International Conference on High Performance Computing

1303 Accesses
5 Citations

Abstract

This article introduces ten different tensor operations, their generalizations, as well as their implementations for a dataflow paradigm. Tensor operations could be utilized for addressing a number of big data problems in machine learning and computer vision, such as speech recognition, visual object recognition, data mining, deep learning, genomics, mind genomics, and applications in civil and geo engineering. As the big data applications are breaking the Exascale barrier, and also the Bronto scale barrier in a not so far future, the main challenge is finding a way to process such big quantities of data.

This article sheds light on various dataflow implementations of tensor operations, mostly those used in machine learning. The iterative nature of tensor operations and a large amount of data makes them situable for the dataflow paradigm. All the dataflow implementations are analyzed comparatively with the related control-flow implementations, for speedup, complexity, power savings, and MTBF. The core contribution of this paper is a table that compare the two paradigms for various data set sizes, and in various conditions of interest.

The results presented in this paper are made to be applicable both for the current dataflow paradigm implementations and for what we believe are the optimal future dataflow paradigm implementations, which we refer to as the Ultimate dataflow. This portability was made possible because the programming model of the current dataflow implementation is applicable also to the Ultimate dataflow. The major differences between the Ultimate dataflow and the current dataflow implementations are not in the programming model, but in the hardware structure and in the capabilities of the optimizing compiler. In order to show the differences between the Ultimate dataflow and the current dataflow implementations, and in order to show what to expect from the future dataflow paradigm implementations, this paper starts with an overview of Ultimate dataflow and its potentials.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
Google Scholar
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)
MathSciNet MATH Google Scholar
Auth, C., et al.: A 10 nm high performance and low-power CMOS technology featuring 3rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects. In: 2017 IEEE International on Electron Devices Meeting (IEDM), p. 29-1. IEEE (2017)
Google Scholar
Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Finan. Stochast. 2(1), 41–68 (1997)
Article MathSciNet MATH Google Scholar
Baskin, C., Liss, N., Mendelson, A., Zheltonozhskii, E.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052 (2017)
Brillouin, L.: Tensors in mechanics and elasficify (1964)
Google Scholar
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, p. 16. ACM (1987)
Google Scholar
Dobravec, T., Bulic, P.: Comparing CPU and GPU implementations of a simple matrix multiplication algorithm. Int. J. Comput. Electr. Eng. 9(2), 430–439 (2017)
Article Google Scholar
Einstein, A.: Relativity: The Special and The General Theory. Princeton University Press, Princeton (2015)
MATH Google Scholar
Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in MATLAB: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)
Article MathSciNet MATH Google Scholar
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)
Article MathSciNet MATH Google Scholar
Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM (JACM) 60(6), 45 (2013)
Article MathSciNet MATH Google Scholar
Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012)
Article MathSciNet MATH Google Scholar
Blagojevic, V., et al.: A systematic approach to generation of new ideas for PhD research in computing. In: Advances in Computers, vol. 104, pp. 1–19. Elsevier (2016)
Google Scholar
Milutinovic, V., et al.: A new course on R&D project management in computer science and engineering: subjects taught, rationales behind, and lessons learned. In: Advances in Computers, vol. 106, pp. 1–19. Elsevier (2017)
Google Scholar
Stojanovic, S., Bojic, D., Bojovic, M.: An overview of selected heterogeneous and reconfigurable architectures. In: Advances in Computers, vol. 96, pp. 1–45. Elsevier (2015)
Google Scholar
Janzamin, M., Sedghi, H., Anandkumar, A.: Beating the perils of non-convexity: guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015)
Jennings, A., McKeown, J.J.: Matrix Computation. Wiley, Hoboken (1992)
MATH Google Scholar
Jovanović, Ž., Milutinović, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)
Article Google Scholar
Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004. IEEE (2013)
Google Scholar
Kanellos, M.: 152,000 smart devices every minute in 2025: IDC outlines the future of smart things. Forbes.com (2016)
Google Scholar
Karatzoglou, A., Amatriain, X., Baltrunas, L., Oliver, N.: Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the Fourth ACM Conference on Recommender systems, pp. 79–86. ACM (2010)
Google Scholar
Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. (TOMS) 5(3), 308–323 (1979)
Article MATH Google Scholar
Meyer, C.D.: Matrix Analysis and Applied Linear Algebra, vol. 71. SIAM, Philadelphia (2000)
Book Google Scholar
Miller, E., Ladenheim, S., Martin, C.: Higher order tensor operations and their applications. TCNJ J. Stud. Sch. 11, 1–15 (2009)
Google Scholar
Milutinovic, D., Milutinovic, V., Soucek, B.: The honeycomb architecture (1987)
Google Scholar
Milutinovic, V., et al.: Splitting spatial and temporal localities for entropy minimiation. Tutorial of the IEEE ISCA (1995)
Google Scholar
Milutinovic, V.: A comparison of suboptimal detection algorithms applied to the additive mix of orthogonal sinusoidal signals. IEEE Trans. Commun. 36(5), 538–543 (1988)
Article Google Scholar
Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66125-4
Book Google Scholar
Milutinović, V., Salom, J., Trifunović, N., Giorgi, R.: Guide to DataFlow Supercomputing. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16229-4
Book Google Scholar
Milutinovic, V., Salom, J., Veljovic, D., Korolija, N., Markovic, D., Petrovic, L.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66128-5
Book Google Scholar
Nurvitadhi, E., et al.: Can FPGAS beat GPUS in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. ACM (2017)
Google Scholar
Rupp, K., et al.: Years of microprocessor trend data. http://www.karlrupp.net/wp-content/uploads/2015.06/40-years-processor-trend.png (40)
Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. J. Phys.: Conf. Ser. 78, 012022 (2007)
Google Scholar
Trifunovic, N., Milutinovic, V., et al.: The AppGallery.Maxeler.com for bigdata supercomputing. J. Big Data 3(1), 1–9 (2016)
Article Google Scholar
Trifunovic, N., Milutinovic, V., Salom, J., Kos, A.: Paradigm shift in big data SuperComputing: DataFlow vs. ControlFlow. J. Big Data 2(1), 4 (2015)
Article Google Scholar
Trobec, R., Vasiljevic, R., Tomasevic, M., Milutinovic, V., et al.: Interconnection networks for petacomputing. ACM Comput. Surv. 49(1), 1–24 (2016)
Article Google Scholar
Tuffley, D.: Google’s release of TensorFlow could be a gamechanger in the future of AI (2015)
Google Scholar
Voss, N., Bacis, M., Mencer, O., Gaydadjiev, G., Luk, W.: Convolutional neural networks on dataflow engines. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 435–438. IEEE (2017)
Google Scholar
Wu, K.C., Tsai, Y.W.: Structured ASIC, evolution or revolution? In: Proceedings of the 2004 International Symposium on Physical Design, pp. 103–106. ACM (2004)
Google Scholar
Ye, J., Li, Q.: A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 929–941 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering, University of Belgrade, Belgrade, Serbia
Milos Kotlar
Fellow of the IEEE, Washington DC, USA
Veljko Milutinovic
Academia Europaea, London, UK
Veljko Milutinovic
Department of Computer Science, University of Indiana, Bloomington, IN, USA
Veljko Milutinovic
Mathematical Institute of the Serbian Academy of Arts and Sciences, Belgrade, Serbia
Veljko Milutinovic

Authors

Milos Kotlar
View author publications
You can also search for this author in PubMed Google Scholar
Veljko Milutinovic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milos Kotlar .

Editor information

Editors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Rio Yokota
University of Edinburgh, Edinburgh, UK
Michèle Weiland
Lawrence Berkeley National Laboratory, Berkeley, CA, USA
John Shalf
Swiss National Supercomputing Centre, Lugano, Switzerland
Sadaf Alam

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kotlar, M., Milutinovic, V. (2018). Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-02465-9_22
Published: 25 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics