Skip to main content

Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9574))

Abstract

The aim of this paper is to show that well known SPARSKIT SpMV routines for Ellpack-Itpack and Jagged Diagonal formats can be easily and successfully adapted to a hybrid GPU-accelerated computer environment using OpenACC. We formulate general guidelines for simple steps that should be done to transform source codes with irregular data access into efficient OpenACC programs. We also advise how to improve the performance of such programs by tuning data structures to utilize hardware properties of GPUs. Numerical experiments show that our accelerated versions of SPARSKIT SpMV routines achieve the performance comparable with the performance of the corresponding CUSPARSE routines optimized by NVIDIA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming with CUDA. ACM Queue 6, 40–53 (2008)

    Article  Google Scholar 

  2. Leist, A., Playne, D.P., Hawick, K.A.: Exploiting graphical processing units for data-parallel scientific applications. Concurrency Comput. Pract. Experience 21, 2400–2437 (2009)

    Article  Google Scholar 

  3. Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: the PLASMA and MAGMA projects. J. Phys. Conf. Ser. 180, 012037 (2009)

    Article  Google Scholar 

  4. Nath, R., Tomov, S., Dongarra, J.: Accelerating GPU kernels for dense linear algebra. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 83–92. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  5. Kowalik, J.S., Puzniakowski, T.: Using OpenCL - Programming Massively Parallel Computers. Advances in Parallel Computing, vol. 21. IOS Press, Amsterdam (2012)

    Google Scholar 

  6. Saad, Y.: Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia (2003)

    Book  MATH  Google Scholar 

  7. Li, R., Saad, Y.: GPU-accelerated preconditioned iterative linear solvers. J. Supercomputing 63, 443–466 (2013)

    Article  Google Scholar 

  8. Helfenstein, R., Koko, J.: Parallel preconditioned conjugate gradient algorithm on GPU. J. Comput. Appl. Math. 236, 3584–3590 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  9. Feng, X., Jin, H., Zheng, R., Shao, Z., Zhu, L.: A segment-based sparse matrix-vector multiplication on CUDA. Concurrency Comput. Pract. Experience 26, 271–286 (2014)

    Article  Google Scholar 

  10. Pichel, J.C., Lorenzo, J.A., Rivera, F.F., Heras, D.B., Pena, T.F.: Using sampled information: is it enough for the sparse matrix-vector product locality optimization? Concurrency Comput. Practi. Experience 26, 98–117 (2014)

    Article  Google Scholar 

  11. Vázquez, F., López, G.O., Fernández, J., Garzón, E.M.: Improving the performance of the sparse matrix vector product with GPUs. In: 10th IEEE International Conference on Computer and Information Technology, CIT 2010, Bradford, West Yorkshire, UK, 29 June-1 July 2010, pp. 1146–1151 (2010)

    Google Scholar 

  12. Williams, S., Oliker, L., Vuduc, R.W., Shalf, J., Yelick, K.A., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput. 35, 178–194 (2009)

    Article  Google Scholar 

  13. Matam, K.K., Kothapalli, K.: Accelerating sparse matrix vector multiplication in iterative methods using GPU. In: International Conference on Parallel Processing, ICPP 2011, Taipei, Taiwan, 13–16 September 2011, pp. 612–621 (2011)

    Google Scholar 

  14. Bylina, B., Bylina, J., Stpiczyński, P., Szałkowski, D.: Performance analysis of multicore and multinodal implementation of SpMV operation. In: Proceedings of the Federated Conference on Computer Science and Information Systems, 7–10 September 2014, Warsaw, Poland, pp. 575–582. IEEE Computer Society Press (2014)

    Google Scholar 

  15. Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J., Menon, R.: Parallel Programming in OpenMP. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  16. Marowka, A.: Parallel computing on any desktop. Commun. ACM 50, 74–78 (2007)

    Article  Google Scholar 

  17. OpenACC: The OpenACC Application Programming Interface (2013). http://www.openacc.org

  18. Sabne, A., Sakdhnagool, P., Lee, S., Vetter, J.S.: Evaluating performance portability of OpenACC. In: Brodman, J., Tu, P. (eds.) LCPC 2014. LNCS, vol. 8967, pp. 51–66. Springer, Heidelberg (2015)

    Google Scholar 

  19. Wang, C., Xu, R., Chandrasekaran, S., Chapman, B.M., Hernandez, O.R.: A validation testsuite for OpenACC 1.0. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, Phoenix, AZ, USA, 19–23 May 2014, pp. 1407–1416 (2014)

    Google Scholar 

  20. Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: A preliminary evaluation of OpenACC implementations. J. Supercomputing 65, 1063–1075 (2013)

    Article  Google Scholar 

  21. Eberl, H.J., Sudarsan, R.: OpenACC parallelisation for diffusion problems, applied to temperature distribution on a honeycomb around the bee brood: a worked example using BiCGSTAB. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds.) PPAM 2013, Part II. LNCS, vol. 8385, pp. 311–321. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  22. Fegerlund, O.A., Kitayama, T., Hashimoto, G., Okuda, H.: Effect of GPU communication-hiding for SpMV using OpenACC. In: Proceedings of the 5th International Conference on Computational Methods (ICCM 2014) (2014)

    Google Scholar 

  23. NVIDIA: CUDA CUSPARSE Library. NVIDIA Corporation (2015). http://www.nvidia.com/

  24. Grimes, R., Kincaid, D., Young, D.: ITPACK 2.0 users guide. Technical report CNA-150, Center for Numerical Analysis, University of Texas (1979)

    Google Scholar 

  25. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Basermann, A., Bishop, A.R.: Sparse matrix-vector multiplication on GPGPU clusters: a new storage format and a scalable implementation. In: 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDpPS 2012, Shanghai, China, 21–25 May 2012, pp. 1696–1702 (2012)

    Google Scholar 

  26. Wolfe, M.: Implementing the PGI accelerator model. In: Kaeli, D.R., Leeser, M. (eds.) Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, GPGpPU 2010, Pittsburgh, Pennsylvania, USA, 14 March 2010. ACM International Conference Proceeding Series, vol. 425, pp. 43–50. ACM (2010)

    Google Scholar 

  27. Boisvert, R.F., Pozo, R., Remington, K.A., Barrett, R.F., Dongarra, J.: Matrix market: a web resource for test matrix collections. In: Boisvert, R.F. (ed.) Quality of Numerical Software - Assessment and Enhancement, Proceedings of the IFIP TC2/WG2.5 Working Conference on the Quality of Numerical Software, Assessment and Enhancement, Oxford, UK, 8–12 July 1996. IFIP Conference Proceedings, vol. 76, pp. 125–137. Chapman & Hall (1997)

    Google Scholar 

  28. Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1–25 (2011)

    MathSciNet  Google Scholar 

  29. NVIDIA Corporation: CUDA Programming Guide. NVIDIA Corporation (2015). http://www.nvidia.com/

  30. NVIDIA: CUDA C Best Practices Guide. NVIDIA Corporation (2015). http://www.nvidia.com/

  31. Xu, R., Chandrasekaran, S., Chapman, B.M.: Exploring programming multi-GPUs using OpenMP and OpenACC-based hybrid model. In: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, Cambridge, MA, USA, 20–24 May 2013, pp. 1169–1176 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Przemysław Stpiczyński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Stpiczyński, P. (2016). Semiautomatic Acceleration of Sparse Matrix-Vector Product Using OpenACC. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. Lecture Notes in Computer Science(), vol 9574. Springer, Cham. https://doi.org/10.1007/978-3-319-32152-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32152-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32151-6

  • Online ISBN: 978-3-319-32152-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics