Abstract
This article introduces ten different tensor operations, their generalizations, as well as their implementations for a dataflow paradigm. Tensor operations could be utilized for addressing a number of big data problems in machine learning and computer vision, such as speech recognition, visual object recognition, data mining, deep learning, genomics, mind genomics, and applications in civil and geo engineering. As the big data applications are breaking the Exascale barrier, and also the Bronto scale barrier in a not so far future, the main challenge is finding a way to process such big quantities of data.
This article sheds light on various dataflow implementations of tensor operations, mostly those used in machine learning. The iterative nature of tensor operations and a large amount of data makes them situable for the dataflow paradigm. All the dataflow implementations are analyzed comparatively with the related control-flow implementations, for speedup, complexity, power savings, and MTBF. The core contribution of this paper is a table that compare the two paradigms for various data set sizes, and in various conditions of interest.
The results presented in this paper are made to be applicable both for the current dataflow paradigm implementations and for what we believe are the optimal future dataflow paradigm implementations, which we refer to as the Ultimate dataflow. This portability was made possible because the programming model of the current dataflow implementation is applicable also to the Ultimate dataflow. The major differences between the Ultimate dataflow and the current dataflow implementations are not in the programming model, but in the hardware structure and in the capabilities of the optimizing compiler. In order to show the differences between the Ultimate dataflow and the current dataflow implementations, and in order to show what to expect from the future dataflow paradigm implementations, this paper starts with an overview of Ultimate dataflow and its potentials.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
Anandkumar, A., Ge, R., Hsu, D., Kakade, S.M., Telgarsky, M.: Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15(1), 2773–2832 (2014)
Auth, C., et al.: A 10 nm high performance and low-power CMOS technology featuring 3rd generation FinFET transistors, Self-Aligned Quad Patterning, contact over active gate and cobalt local interconnects. In: 2017 IEEE International on Electron Devices Meeting (IEDM), p. 29-1. IEEE (2017)
Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Finan. Stochast. 2(1), 41–68 (1997)
Baskin, C., Liss, N., Mendelson, A., Zheltonozhskii, E.: Streaming architecture for large-scale quantized neural networks on an FPGA-based dataflow platform. arXiv preprint arXiv:1708.00052 (2017)
Brillouin, L.: Tensors in mechanics and elasficify (1964)
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing, p. 16. ACM (1987)
Dobravec, T., Bulic, P.: Comparing CPU and GPU implementations of a simple matrix multiplication algorithm. Int. J. Comput. Electr. Eng. 9(2), 430–439 (2017)
Einstein, A.: Relativity: The Special and The General Theory. Princeton University Press, Princeton (2015)
Gilbert, J.R., Moler, C., Schreiber, R.: Sparse matrices in MATLAB: design and implementation. SIAM J. Matrix Anal. Appl. 13(1), 333–356 (1992)
Golub, G.H., Reinsch, C.: Singular value decomposition and least squares solutions. Numerische Mathematik 14(5), 403–420 (1970)
Hillar, C.J., Lim, L.H.: Most tensor problems are NP-hard. J. ACM (JACM) 60(6), 45 (2013)
Hsu, D., Kakade, S.M., Zhang, T.: A spectral algorithm for learning hidden Markov models. J. Comput. Syst. Sci. 78(5), 1460–1480 (2012)
Blagojevic, V., et al.: A systematic approach to generation of new ideas for PhD research in computing. In: Advances in Computers, vol. 104, pp. 1–19. Elsevier (2016)
Milutinovic, V., et al.: A new course on R&D project management in computer science and engineering: subjects taught, rationales behind, and lessons learned. In: Advances in Computers, vol. 106, pp. 1–19. Elsevier (2017)
Stojanovic, S., Bojic, D., Bojovic, M.: An overview of selected heterogeneous and reconfigurable architectures. In: Advances in Computers, vol. 96, pp. 1–45. Elsevier (2015)
Janzamin, M., Sedghi, H., Anandkumar, A.: Beating the perils of non-convexity: guaranteed training of neural networks using tensor methods. arXiv preprint arXiv:1506.08473 (2015)
Jennings, A., McKeown, J.J.: Matrix Computation. Wiley, Hoboken (1992)
Jovanović, Ž., Milutinović, V.: FPGA accelerator for floating-point matrix multiplication. IET Comput. Digit. Tech. 6(4), 249–256 (2012)
Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big data: issues and challenges moving forward. In: 2013 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004. IEEE (2013)
Kanellos, M.: 152,000 smart devices every minute in 2025: IDC outlines the future of smart things. Forbes.com (2016)
Karatzoglou, A., Amatriain, X., Baltrunas, L., Oliver, N.: Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In: Proceedings of the Fourth ACM Conference on Recommender systems, pp. 79–86. ACM (2010)
Lawson, C.L., Hanson, R.J., Kincaid, D.R., Krogh, F.T.: Basic linear algebra subprograms for FORTRAN usage. ACM Trans. Math. Softw. (TOMS) 5(3), 308–323 (1979)
Meyer, C.D.: Matrix Analysis and Applied Linear Algebra, vol. 71. SIAM, Philadelphia (2000)
Miller, E., Ladenheim, S., Martin, C.: Higher order tensor operations and their applications. TCNJ J. Stud. Sch. 11, 1–15 (2009)
Milutinovic, D., Milutinovic, V., Soucek, B.: The honeycomb architecture (1987)
Milutinovic, V., et al.: Splitting spatial and temporal localities for entropy minimiation. Tutorial of the IEEE ISCA (1995)
Milutinovic, V.: A comparison of suboptimal detection algorithms applied to the additive mix of orthogonal sinusoidal signals. IEEE Trans. Commun. 36(5), 538–543 (1988)
Milutinovic, V., Kotlar, M., Stojanovic, M., Dundic, I., Trifunovic, N., Babovic, Z.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66125-4
Milutinović, V., Salom, J., Trifunović, N., Giorgi, R.: Guide to DataFlow Supercomputing. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16229-4
Milutinovic, V., Salom, J., Veljovic, D., Korolija, N., Markovic, D., Petrovic, L.: DataFlow Supercomputing Essentials. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66128-5
Nurvitadhi, E., et al.: Can FPGAS beat GPUS in accelerating next-generation deep neural networks? In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5–14. ACM (2017)
Rupp, K., et al.: Years of microprocessor trend data. http://www.karlrupp.net/wp-content/uploads/2015.06/40-years-processor-trend.png (40)
Schroeder, B., Gibson, G.A.: Understanding failures in petascale computers. J. Phys.: Conf. Ser. 78, 012022 (2007)
Trifunovic, N., Milutinovic, V., et al.: The AppGallery.Maxeler.com for bigdata supercomputing. J. Big Data 3(1), 1–9 (2016)
Trifunovic, N., Milutinovic, V., Salom, J., Kos, A.: Paradigm shift in big data SuperComputing: DataFlow vs. ControlFlow. J. Big Data 2(1), 4 (2015)
Trobec, R., Vasiljevic, R., Tomasevic, M., Milutinovic, V., et al.: Interconnection networks for petacomputing. ACM Comput. Surv. 49(1), 1–24 (2016)
Tuffley, D.: Google’s release of TensorFlow could be a gamechanger in the future of AI (2015)
Voss, N., Bacis, M., Mencer, O., Gaydadjiev, G., Luk, W.: Convolutional neural networks on dataflow engines. In: 2017 IEEE International Conference on Computer Design (ICCD), pp. 435–438. IEEE (2017)
Wu, K.C., Tsai, Y.W.: Structured ASIC, evolution or revolution? In: Proceedings of the 2004 International Symposium on Physical Design, pp. 103–106. ACM (2004)
Ye, J., Li, Q.: A two-stage linear discriminant analysis via QR-decomposition. IEEE Trans. Pattern Anal. Mach. Intell. 27(6), 929–941 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kotlar, M., Milutinovic, V. (2018). Comparing Controlflow and Dataflow for Tensor Calculus: Speed, Power, Complexity, and MTBF. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-02465-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)