Abstract
Performance models are important in the design and analysis of linear algebra software for scalable high performance computer systems. They can be used for estimation of the overhead in a parallel algorithm and measuring the impact of machine characteristics and block sizes on the execution time. We present an hierarchical approach for design of performance models for parallel algorithms in linear algebra based on a parallel machine model and the hierarchical structure of the ScaLAPACK library. This suggests three levels of performance models corresponding to existing ScaLAPACK routines. As a proof of the concept a performance model of the high level QR factorization routine pdgeqrf is presented. We also derive performance models of lower level ScaLAPACK building blocks such as pdgeqr2, pdlarft, pdlarfb, pdlarfg, pdlarf, pdnrm2, and pdscal, which are used in the high level model for pdgeqrf. Predicted performance results are compared to measurements on an Intel Paragon XP/S system. The accuracy of the top level model is over 90% for measured matrix and block sizes and different process grid configurations.
Preview
Unable to display preview. Download preview PDF.
References
J. Choi, J. Demmel, I. Dhillon, J. Dongarra, S. Ostrouchov, A. Petit, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers — Design Issues and Performance. Technical Report UT CS-95-283, LAPACK Working Note 95, 1995.
J. Choi, J. Dongarra, S. Ostrouchov, A. Petit, D. Walker, and R.C. Whaley. The Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines. To appear in Scientific Programming, 1996.
K. Dackland and B. Kågström. Reduction of a Regular Matrix Pair (A, B) to Block Hessenberg-Triangular Form. In Dongarra et. al., editor, Applied Parallel Computing: Computations in Physics, Chemistry and Engineering Science, pages 125–133, Berlin, 1995. Springer-Verlag. Lecture Notes in Computer Science, Vol. 1041, Proceedings, Lyngby, Denmark.
J. Dongarra and R. van de Geijn. Two dimensional Basic Linear Algebra Communication Subprograms. Technical Report UT CS-91-138, LAPACK Working Note 37, University of Tennessee, 1991.
J. Dongarra and R. C. Whaley. A Users Guide to BLACS v1.0. Technical Report UT CS-95-281, LAPACK Working Note 94, University of Tennessee, 1995.
I. Duff, S. Hammarling, J. Dongarra, and J. Du Croz. A Set of Level 3 Basic Linear Algebra Subprograms. ACM Transactions on Mathematical Software, 16(1):1–17, 1990.
S. Hammarling, R. Hanson, J. Dongarra, and J. Du Croz. Algorithm 656: An extended Set of Basic Linear Algebra Subprograms: Model Implementation and Test Programs. A CM Transactions on Mathematical Software, 14(1):18–18, 1988.
Intel Corporation. Paragon System Basic Math Library Performance Report. Order Number 312936-003, 1995.
D. Kincaid, F. Krogh C. Lawson, and R. Hanson. Basic Linear Algebra Subprograms for Fortran Usage. ACM Transactions on Mathematical Software, 5(3):308–323, 1979.
R. Schreiber and C. Van Loan. A Storage Efficient WY Representation for Products of Householder Transformations. SIAM J. Sci. and Stat. Comp., 10:53–57 1989.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dackland, K., Kågström, B. (1996). An hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine. In: Waśniewski, J., Dongarra, J., Madsen, K., Olesen, D. (eds) Applied Parallel Computing Industrial Computation and Optimization. PARA 1996. Lecture Notes in Computer Science, vol 1184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-62095-8_20
Download citation
DOI: https://doi.org/10.1007/3-540-62095-8_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-62095-2
Online ISBN: 978-3-540-49643-4
eBook Packages: Springer Book Archive