Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation

Dubois, Jérôme; Calvin, Christophe; Petiton, Serge

doi:10.1007/978-3-642-19328-6_7

Jérôme Dubois^20,21,
Christophe Calvin²⁰ &
Serge Petiton²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6449))

Included in the following conference series:

International Conference on High Performance Computing for Computational Science

1491 Accesses
3 Citations

Abstract

We study the numerical behavior of heterogeneous systems such as CPU with GPU or IBM Cell processors for some orthogonalization processes. We focus on the influence of the different floating arithmetic handling of these accelerators with Gram-Schmidt orthogonalization using single and double precision. We observe for dense matrices a loss of at worst 1 digit for CUDA-enabled GPUs as well as a speed-up of 20x, and 2 digits for the Cell processor for a 7x speed-up. For sparse matrices, the result between CPU and GPU is very close and the speed-up is 10x. We conclude that the Cell processor is a good accelerator for double precision because of its full IEEE compliance, and not sufficient for single precision applications. The GPU speed-up is better than Cell and the decent IEEE support delivers results close to the CPU ones for both precisions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

An updated set of basic linear algebra subprograms (blas). ACM Trans. Math. Softw. 28(2), 135–151 (2002)
Google Scholar
Arevalo, A., Matinata, R.M., (Raj)Pandian, M., Peri, E., Ruby, K., Thomas, F., Almond, C.: Architecture overview and its impact on programming. In: Programming the Cell Broadband Engine Architecture: Examples and Best Practices, ch. 4.61. IBM (2008)
Google Scholar
Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: SC 2009: Proceedings of the 2009 ACM/IEEE Conference on Supercomputing. ACM, New York (2009)
Google Scholar
Braconnier, T., Langlois, P., Rioual, J.C.: The influence of orthogonality on the arnoldi method. Linear Algebra and its Applications 309(1-3), 307–323 (2000)
Article MathSciNet MATH Google Scholar
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for gpus: stream computing on graphics hardware. ACM Trans. Graph. 23(3), 777–786 (2004)
Article Google Scholar
NVidia Corporation. Nvidia: Cublas library. Technical report. Whitepaper. Part of CUDA Toolkit
Google Scholar
Duff, I.S., Grimes, R.G., Lewis, J.G.: Sparse matrix test problems. ACM Trans. Math. Softw. 15(1), 1–14 (1989)
Article MathSciNet MATH Google Scholar
Frigo, M., Johnson, S.G.: Fftw on the cell processor, http://www.fftw.org/cell/
Giraud, L., Langou, J., Rozložník, M., van den Eshof, J.: Rounding error analysis of the classical Gram-Schmidt orthogonalization process. Numerische Mathematik 101(1), 87–100 (2005)
Article MathSciNet MATH Google Scholar
Goldberg, D.: What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (1991)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations (Johns Hopkins Studies in Mathematical Sciences). The Johns Hopkins University Press, Baltimore (1996)
Google Scholar
Hernandez, V., Roman, J.E., Tomas, A.: Parallel arnoldi eigensolvers with enhanced scalability via global communications rearrangement. Parallel Comput. 33(7-8), 521–540 (2007)
Article MathSciNet Google Scholar
IEEE: IEEE standard for binary floating-point arithmetic. ACM SIGPLAN Notices 22(2), 9–25 (1985)
Google Scholar
Meuer, H., Strohmaier, E., Dongarra, J., Simon, H.: Architecture share over time, http://www.top500.org/overtime/list/32/archtype
NVIDIA. NVIDIA CUDA Programming Guide 2.0 (2008)
Google Scholar
Rozlozník, M., Strakos, Z., Tuma, M.: On the role of orthogonality in the gmres method. In: Král, J., Bartosek, M., Jeffery, K. (eds.) SOFSEM 1996. LNCS, vol. 1175, pp. 409–416. Springer, Heidelberg (1996)
Chapter Google Scholar
Takuya, Y., Daisuke, T., Taisuke, B., Mitsuhisa, S.: Parallel implementation of classical gram-schmidt orthogonalization using matrix multiplication. IPSJ SIG Technical Reports (63(HPC-106)), 31–36 (2006)
Google Scholar
Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the atlas project. Parallel Computing 27, 2001 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Commissariat l’Energie Atomique, CEA-Saclay/DEN/DANS/DM2S/SERMA/LLPR, F-91191, Gif-sur-Yvette Cedex, France
Jérôme Dubois & Christophe Calvin
Laboratoire d’Informatique Fondamentale de Lille, Université de Lille 1, F-59650, Villeneuve d’Ascq Cedex, France
Jérôme Dubois & Serge Petiton

Authors

Jérôme Dubois
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Calvin
View author publications
You can also search for this author in PubMed Google Scholar
Serge Petiton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculdade de Engenharia da, Universidade do Porto, Rua Dr. Roberto Frias s/n, 4200-465, Porto, Portugal
José M. Laginha M. Palma
INP (ENSEEIHT) IRIT, University of Toulouse, rue Charles-Camichel, CEDEX 7, 31071, Toulouse, France
Michel Daydé
Lawrence Berkeley National Laboratory, Berkeley, USA
Osni Marques
Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, s/n, 4200-465, Porto, Portugal
João Correia Lopes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dubois, J., Calvin, C., Petiton, S. (2011). Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-19328-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19327-9
Online ISBN: 978-3-642-19328-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics