Scalable, Portable, Verifiable Kronecker Products on Multi-scale Computers

  • Lenore MullinEmail author
  • James Raynolds
Part of the Studies in Computational Intelligence book series (SCI, volume 539)


Understanding the layout of data and the accessing of that data is paramount to the optimal performance of an algorithm on one or many processors. This paper addresses the need for efficient tools to implement and carry out tensor based computations for scientific and engineering applications. In particular, we focus on certain ubiquitous operations such as outer products of arbitrary multi-dimensional arrays and matrix Kronecker products. We advocate an algebraic methodology based on A Mathematics of Arrays (MoA) and the ψ-Calculus, in which, any array based computer language (such as MATLAB) would be augmented to achieve optimal performance for the computation of multiple outer products. In this approach, an Operational Normal Form (ONF), which specifies the most efficient implementation in terms of starts, stops, and strides is mathematically derived given specific details of the processor/memory hierarchy. The vision of this research is the creation of a system in which the application scientist or engineer can use a functional subset of his/her favorite language and, in so doing, have the ability to generate code with high efficiency and compiler-like optimizations.


Kronecker Product Index Vector Outer Product Dimensional Array Partial Index 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abrams, P.S.: What’s wrong with APL? In: APL 1975: Proceedings of Seventh International Conference on APL, pp. 1–8. ACM, New York (1975)Google Scholar
  2. 2.
    Abrams, P.S.: An APL machine. PhD thesis, Stanford University, Stanford, CA, USA (1970)Google Scholar
  3. 3.
    Acar, E., Dunlavy, D.M., Kolda, T.G., Morup, M.: Scalable tensor factorizations with missing data. In: SDM 2010: Proceedings of the 2010 SIAM International Conference on Data Mining (April 2010)Google Scholar
  4. 4.
    Acar, E., Kolda, T.G., Dunlavy, D.M.: An optimizations approach for fitting canonical tensor decompositions. Technical Report SAND2009-0857, Sandia National Laboratories, Albuquerque, NM and Livermore, CA (February 2009)Google Scholar
  5. 5.
    Bader, B.W., Kolda, T.G.: Matlab tensor toolbox version 2.4 (2001),
  6. 6.
    Bader, B.W., Kolda, T.G.: Algorithm 862: Matlab tensor classes for fast algorithm prototyping. ACM Transactions on Mathematical Software 32(4) (December 2006)Google Scholar
  7. 7.
    Bader, B.W., Kolda, T.G.: Efficient matlab computations with sparse and factored matrices. SIAM Journal on Scientific Computing 30(1), 205–231 (2007)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Berkling, K.: Arrays and the lambda calculus. Technical report, CASE Center and School of CIS, Syracuse University (1990)Google Scholar
  9. 9.
    Eatherton, W., Kelly, J., Schiefelbein, T., Pottinger, H., Mullin, L.R., Ziegler, R.: An fpga based reconfigurable coprocessor board utilizing a mathematics of arrays. Technical report, University of Missouri–Rolla, Computer Science Department (1995)Google Scholar
  10. 10.
    Gerhart, S.: Verification of APL Programs. PhD thesis, CMU (1972)Google Scholar
  11. 11.
    Helal, M.A.: Dimension and shape invariant programming: The implementation and the application. Master’s thesis, The American University in Cairo, Department of Computer Science (2001)Google Scholar
  12. 12.
    Kennedy, A., et al.: (October 2009),
  13. 13.
    Kepner, J.: Parallel matlab for multicore and multinode camputers. SIAM, Philadelphia (2009)CrossRefGoogle Scholar
  14. 14.
    Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Review 51(3), 455–500 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    McMahon, T.: Mathematical formulation of general partitioning of multi-dimensional arrays to multi-dimensional architectures using the Psi calculus. Undergraduate Honors Thesis (1995)Google Scholar
  16. 16.
    Mermin, N.D.: Quantum Computer Science. Cambridge University Press, Cambridge (2007)CrossRefzbMATHGoogle Scholar
  17. 17.
    Mullin, L.M.R.: A Mathematics of Arrays. PhD thesis, Syracuse University (December 1988)Google Scholar
  18. 18.
    Mullin, L.R.: A uniform way of reasoning about array–based computation in radar: Algebraically connecting the hardware/software boundary. Digital Signal Processing 15, 466–520 (2005)CrossRefGoogle Scholar
  19. 19.
    Mullin, L., Kluge, W., Scholtz, S.: On programming scientific applications in SAC – a functional language extended by a subsystem for high level array operations. In: Kluge, W.E. (ed.) IFL 1996. LNCS, vol. 1268, pp. 85–104. Springer, Heidelberg (1997)CrossRefGoogle Scholar
  20. 20.
    Mullin, L., Nemer, N., Thibault, S.: The Psi compiler v4.0 for HPF to Fortran 90: User’s Guide. Department of Computer Science, University of Missouri–Rolla (1994)Google Scholar
  21. 21.
    Mullin, L.R., Raynolds, J.E.: Conformal computing: Algebraically connecting the hardware/software boundary using a uniform approach to high-performance computation for software and hardware applications. CoRR, abs/0803.2386 (2008)Google Scholar
  22. 22.
    Mullin, L., Rutledge, E., Bond, R.: Monolithic compiler experiments using C+ + expression templates. In: Proceedings of the High Performance Embedded Computing Workshop (HPEC 2002). MIT Lincoln Lab, Lexington (2002)Google Scholar
  23. 23.
    Mullin, L., Rutledge, E., Bond, R.: Monolithic compiler experiments using C+ + Expression Templates. In: Proceedings of the High Performance Embedded Computing Workshop HPEC 2002. MIT Lincoln Laboratory, Lexington (2002)Google Scholar
  24. 24.
    Mullin, L., Thibault, S.: Reduction semantics for array expressions: The Psi compiler. Technical Report CSC 94-05, Department of Computer Science, University of Missouri-Rolla (1994)Google Scholar
  25. 25.
    NSF-NAIS Workshop Intelligent Software: The Interface between Algorithms and Machines, Ediburgh, Scotland (October 2009),
  26. 26.
    Pottinger, H., Eatherton, W., Kelly, J., Schiefelbein, T., Mullin, L.R., Ziegler, R.: Hardware assists for high performance computing using a mathematics of arrays. In: FPGA 1995: Proceedings of the 1995 ACM Third International Symposium on Field-Programmable Gate Arrays, pp. 39–45. ACM, New York (1995)Google Scholar
  27. 27.
    Raynolds, J.E., Mullin, L.R.: Applications of conformal computing techniques to problems in computational physics: the fast fourier transform. Computer Physics Communications 170(1), 1–10 (2005)CrossRefzbMATHMathSciNetGoogle Scholar
  28. 28.
    Rosenkrantz, D.J., Mullin, L.R., Hunt III, H.B.: On minimizing materializations of array-valued temporaries. ACM Trans. Program. Lang. Syst. 28(6), 1145–1177 (2006)CrossRefGoogle Scholar
  29. 29.
    Tu, H.-C.: FAC: A Functional Array Calculator and it’s Applicaton to APL and Functional Programming. PhD thesis, Yale University (1985)Google Scholar
  30. 30.
    Tu, H.-C., Perlis, A.J.: FAC: A functional APL language. IEEE Software 3(1), 36–45 (1986)CrossRefGoogle Scholar
  31. 31.
    Van Loan, C.: (February 2009),
  32. 32.
  33. 33.
    Van Loan, C.: The Kronecker product: A product of the times. In: SIAM Conference on Applied Linear Algebra, Monterey, California (October 2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.College of Computing and Information (CCI)University at Albany, State University of New YorkAlbanyUSA
  2. 2.Drinker Biddle & Reath, L.L.P.Washington. D.C.USA

Personalised recommendations