Abstract
Issues raised in the implementation of dense linear algebra algorithms on a distributed memory architecture are discussed along with considerations that arise when the algorithms are implemented as general purpose library routines. We focus on a few methods which have proven to be important in the CMSSL implementation of elimination based algorithms such as LU and QR decomposition and bidiagonalization.
We illustrate how conceptual layout manipulations (without data movements) can be used to transform a multidimensional array — representing multiple problems — into an array with a fixed number of dimensions while preserving the given axes defining the problems.
We discuss the dependence existing between performance and the placement of the temporary arrays used in the implementation of the algorithms under consideration.
We show how, in the context of our analysis, we implement the global BLAS operations as a combination of only structured communication and local BLAS computations.
Finally, timing results show that the performance of the routines scales as we go to larger matrices and larger machines.
Preview
Unable to display preview. Download preview PDF.
References
S. M. Balle & P. M. Pedetsen, Singular Value Decomposition of Real Dense Matrices on the Connection Machine CM-5/CM-5E, under preparation (1994).
J. J. Dongarra, Performance of Various Computers Using Standard Linear Equations Software, Technical Report, Oak Ridge National Laboratories, Mathematical Sciences Section, (1994).
G. H. Golub & C. F. Van Loan, Matrix Computations. 2. Ed., Johns Hopkins, (1989).
International Organization for Standardization and International Electrotechnical Commission, Fortran 90 [ISO/IEC 1539: 1991 (E)], (1991).
C. Jesshope, Private Communication, (1994).
C. H. Koebel, D. B. Loveman, R. S. Schreiber, G. L. Steele Jr. & M. E. Zosel, The High Performance Fortran Handbook, MIT Press, (1994).
W. Lichtenstein & S. L. Johnsson, Block-cyclic Dense Linear Algebra, SIAM Journal of Scientific Computing v14:6 (1993), p1257–1286.
S. L. Johnsson & L. Ortiz, Local Basic Linear Algebra Subroutines (LBLAS) for Distributed Memory Architectures and Languages with Array Syntax, Technical Report, TMC-226, Thinking Machines Corporation, Cambridge, (1992).
C. D. Sutton, The implementation of a Portable Software Platform, Ph.D. Thesis, University of Surrey, Guildford, United Kingdom (1994).
Thinking Machines Corporation, The Connection Machine System, CM Fortran Programming Guide, Thinking Machines Corporation, Cambridge, (1994).
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pedersen, P.M., Balle, S.M. (1994). Selected techniques for efficient parallel implementation of dense linear algebra algorithms on the connection machine CM-5/CM-5E. In: Dongarra, J., Waśniewski, J. (eds) Parallel Scientific Computing. PARA 1994. Lecture Notes in Computer Science, vol 879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0030171
Download citation
DOI: https://doi.org/10.1007/BFb0030171
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-58712-5
Online ISBN: 978-3-540-49050-0
eBook Packages: Springer Book Archive