Skip to main content

Scientific software libraries for scalable architectures

  • Conference paper
  • First Online:
Parallel Scientific Computing (PARA 1994)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 879))

Included in the following conference series:

  • 168 Accesses

Abstract

Massively parallel processors introduce new demands on software systems with respect to performance, scalability, robustness and portability. The increased complexity of the memory systems and the increased range of problem sizes for which a given piece of software is used, poses serious challenges to software developers. The Connection Machine Scientific Software Library, CMSSL, uses several novel techniques to meet these challenges. The CMSSL contains routines for managing the data distribution and provides data distribution independent functionality. High performance is achieved through careful scheduling of arithmetic operations and data motion, and through the automatic selection of algorithms at run-time. We discuss some of the techniques used, and provide evidence that CMSSL has reached the goals of performance and scalability for an important set of applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. J. Beaudoin, P. R. Dawson, K. K. Mathur, U.F. Kocks, and D. A. Korzekwa. Application of polycrystal plasticity to sheet forming. Computer Methods in Applied Mechanics and Engineering, in press, 1993.

    Google Scholar 

  2. Jack J. Dongarra, Jeremy Du Croz, Iain Duff, and Sven Hammarling. A Set of Level 3 Basic Linear Algebra Subprograms. Technical Report Reprint No. 1, Argonne National Laboratories, Mathematics and Computer Science Division, August 1988.

    Google Scholar 

  3. M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Mathematical Journal, 25:619–633, 1975.

    Google Scholar 

  4. Zdenek Johan. Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics. PhD thesis, Department of Mechanical Engineering, Stanford University, 1992.

    Google Scholar 

  5. Zdenek Johan, Thomas J.R. Hughes, Kapil K. Mathur, and S. Lennart Johnsson. A data parallel finite element method for computational fluid dynamics on the Connection Machine system. Computer Methods in Applied Mechanics and Engineering, 99(1):113–134, August 1992.

    Google Scholar 

  6. Zdenek Johan, Kapil K. Mathur, S. Lennart Johnsson, and Thomas J.R. Hughes. An efficient communication strategy for Finite Element Methods on the Connection Machine CM-5 system. Computer Methods in Applied Mechanics and Engineering, 113:363–387. 1994.

    Google Scholar 

  7. S. Lennart Johnsson. Fast banded systems solvers for ensemble architectures. Technical Report YALEU/DCS/RR-379, Dept. of Computer Science, Yale University, March 1985.

    Google Scholar 

  8. S. Lennart Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distributed Computing, 4(2):133–172, April 1987.

    Google Scholar 

  9. S. Lennart Johnsson. Minimizing the communication time for matrix multiplication on multiprocessors. Parallel Computing, 19(11):1235–1257, 1993.

    Google Scholar 

  10. S. Lennart Johnsson. Parallel Architectures and their Efficient Use, chapter Massively Parallel Computing: Data distribution and communication, pages 68–92. Springer Verlag, 1993.

    Google Scholar 

  11. S. Lennart Johnsson and Ching-Tien Ho. Spanning graphs for optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Computers, 38(9):1249–1268, September 1989.

    Google Scholar 

  12. S. Lennart Johnsson and Ching-Tien Ho. Optimizing tridiagonal solvers for alternating direction methods on Boolean cube multiprocessors. SIAM J. on Scientific and Statistical Computing, 11(3):563–592, 1990.

    Google Scholar 

  13. S. Lennart Johnsson, Ching-Tien Ho, Michel Jacquemin, and Alan Ruttenberg. Computing fast Fourier transforms on Boolean cubes and related networks. In Advanced Algorithms and Architectures for Signal Processing II, volume 826, pages 223–231. Society of Photo-Optical Instrumentation Engineers, 1987.

    Google Scholar 

  14. S. Lennart Johnsson, Michel Jacquemin, and Robert L. Krawitz. Communication efficient multi-processor FFT. Journal of Computational Physics, 102(2):381–397, October 1992.

    Google Scholar 

  15. S. Lennart Johnsson and Kapil K. Mathur. Distributed level 1 and level 2 BLAS. Technical report, Thinking Machines Corp., 1992. In preparation.

    Google Scholar 

  16. S. Lennart Johnsson and Luis F. Ortiz. Local Basic Linear Algebra Subroutines (LBLAS) for distributed memory architectures and languages with an array syntax. The International Journal of Supercomputer Applications, 6(4):322–350, 1992.

    Google Scholar 

  17. S. Lennart Johnsson, Yousef Saad, and Martin H. Schultz. Alternating direction methods on multiprocessors. SIAM J. Sci. Statist. Comput, 8(5):686–700, 1987.

    Google Scholar 

  18. C.L. Lawson, R.J. Hanson, D.R. Kincaid, and F.T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM TOMS, 5(3):308–323, September 1979.

    Google Scholar 

  19. Woody Lichtenstein and S. Lennart Johnsson. Block cyclic dense linear algebra. SIAM Journal of Scientific Computing, 14(6):1257–1286, 1993.

    Google Scholar 

  20. Kapil K. Mathur and S. Lennart Johnsson. Multiplication of matrices of arbitrary shape on a Data Parallel Computer. Technical Report 216, Thinking Machines Corp., December 1991.

    Google Scholar 

  21. Kapil K. Mathur and S. Lennart Johnsson. All-to-all communication. Technical Report 243, Thinking Machines Corp., December 1992.

    Google Scholar 

  22. Kapil K. Mathur, Alan Needleman, and V. Tvergaard. Ductile failure analyses on massively parallel computers. Computer Methods in Applied Mechanics and Engineering, in press, 1993.

    Google Scholar 

  23. N Metropolis, J Howlett, and Gian-Carlo Rota, editors. A History of Computing in the Twentieth Century. Academic Press, 1980.

    Google Scholar 

  24. Gary L. Miller, Shang-Hua Teng, William Thurston, and Stephen A. Vavasis. Automatic mesh partitioning. In Sparse Matrix Computations: Graph Theory Issues and Algorithms. The Institute of Mathematics and its Applications, 1992.

    Google Scholar 

  25. Alex Pothen, Horst D. Simon, and Kang-Pu Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl, 11(3):430–452, 1990.

    Google Scholar 

  26. Horst D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2:135–148, 1991.

    Google Scholar 

  27. Tayfun Tezduyar. Private communication, 1993.

    Google Scholar 

  28. Thinking Machines Corp. CMSSL for CM Fortran, Version 3.0, 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jack Dongarra Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Johnsson, S.L., Mathur, K.K. (1994). Scientific software libraries for scalable architectures. In: Dongarra, J., Waśniewski, J. (eds) Parallel Scientific Computing. PARA 1994. Lecture Notes in Computer Science, vol 879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030160

Download citation

  • DOI: https://doi.org/10.1007/BFb0030160

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-58712-5

  • Online ISBN: 978-3-540-49050-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics