Abstract
The present Fujitsu PRIMEPOWER 2000 system can have up to 128 processors in an SMP node. It is therefore desirable to provide users of this system with high performance parallel BLAS and LAPACK routines that scale to as many processors as possible. It is also desirable that users can obtain some level of parallel performance merely by relinking their codes with SMP Math Libraries. This talk outlines the major design decisions taken in providing OpenMP versions of BLAS and LAPACK routines to users, it discusses some of the algorithmic issues that have been addressed and it discusses some of short comings of OpenMP for this task.
A good deal has been learned about exploiting OpenMP in this on-going activity and the talk will attempt to identify what worked and what did not work. For instance, while OpenMP does not support recursion, some of the basic ideas behind linear algebra with recursive algorithms can be exploited to overlap sequential operations with parallel ones. As another example, the overheads of dynamic scheduling tended to outweigh the better load balancing that such a schedule provides so that static cyclic loop scheduling was more effective.
Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Addison, C. (2001). Exploiting OpenMP to Provide Scalable SMP BLAS and LAPACK Routines. In: Alexandrov, V.N., Dongarra, J.J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds) Computational Science — ICCS 2001. ICCS 2001. Lecture Notes in Computer Science, vol 2073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45545-0_1
Download citation
DOI: https://doi.org/10.1007/3-540-45545-0_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42232-7
Online ISBN: 978-3-540-45545-5
eBook Packages: Springer Book Archive