High-Performance Matrix-Vector Multiplication on the GPU

Sørensen, Hans Henrik Brandenborg

doi:10.1007/978-3-642-29737-3_42

Hans Henrik Brandenborg Sørensen³⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7155))

Included in the following conference series:

European Conference on Parallel Processing

1670 Accesses
9 Citations

Abstract

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication. We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

Download to read the full chapter text

Chapter PDF

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Keywords

References

NVIDIA Corp.: CUDA C Programming Guide Version 4.0 (2011)
Google Scholar
NVIDIA Corp.: CUDA CUBLAS Library (2011)
Google Scholar
Tomov, S., Nath, R., Du, P., Dongarra, J.: MAGMA v0.2 Users’ Guide (2009)
Google Scholar
Sørensen, H.H.B.: Auto-tuning Dense Vector and Matrix-Vector Operations for Fermi GPUs (2011) (submitted)
Google Scholar
Fujimoto, N.: Faster matrix-vector multiplication on GeForce 8800GTX. In: IEEE International Symposium on Parallel and Distributed Processing (2008)
Google Scholar
Tomov, S., Nath, R., Dongarra, J.: Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing. Parallel Computing 36(12) (2010)
Google Scholar
Anderson, E., Bai, Z., Bischof, C., Blackford, L.S., Demmel, J., Dongarra, J.J., Du Croz, J., Hammarling, S., Greenbaum, A., McKenney, A., Sorensen, D.: LAPACK Users’ guide, 3rd edn. SIAM, Philadelphia (1999)
Book Google Scholar
Nath, R., Tomov, S., Dongarra, J.: Accelerating GPU kernels for dense linear algebra (2009)
Google Scholar
Li, Y., Dongarra, J., Tomov, S.: A Note on Auto-tuning GEMM for GPUs (2009)
Google Scholar
NVIDIA Corp.: Fermi, Whitepaper (2009)
Google Scholar
Harris, M.: Optimizing Parallel Reduction in CUDA. NVIDIA Dev. Tech. (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Informatics and Mathematical Modelling, Technical University of Denmark, Bldg. 321, DK-2800, Lyngby, Denmark
Hans Henrik Brandenborg Sørensen

Authors

Hans Henrik Brandenborg Sørensen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria
Michael Alexander
ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy
Pasqua D’Ambra
University of Amsterdam, 1090, Amsterdam, Netherlands
Adam Belloum
Innovative Computing Laboratory, The University of Tennessee, USA
George Bosilca
Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy
Mario Cannataro
Computer Science Department, University of Pisa, Italy
Marco Danelutto
Second University of Naples, Italy
Beniamino Di Martino
TU München, Boltzmannstr. 3, 85748, Garching, Germany
Michael Gerndt
Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Emmanuel Jeannot & Raymond Namyst &
Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Jean Roman
Oak Ridge National Laboratory, Computer Science and Mathematics Division, 37831-6164, Oak Ridge, TN, USA
Stephen L. Scott
Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austrial
Jesper Larsson Traff
Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA
Geoffroy Vallée
Technische Universität München, Germany
Josef Weidendorfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sørensen, H.H.B. (2012). High-Performance Matrix-Vector Multiplication on the GPU. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7155. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29737-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-29737-3_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29736-6
Online ISBN: 978-3-642-29737-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

High-Performance Matrix-Vector Multiplication on the GPU

Abstract

Chapter PDF

Similar content being viewed by others

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

High-Performance Matrix-Vector Multiplication on the GPU

Abstract

Chapter PDF

Similar content being viewed by others

High-Performance Matrix-Matrix Multiplications of Very Small Matrices

Accelerating Numerical Dense Linear Algebra Calculations with GPUs

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation