Skip to main content

A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2017)

Abstract

We consider the performance of a selected computational kernel from a scientific code on different generations of NVIDIA GPUs. The code that we use for tests is an OpenCL implementation of finite element numerical integration algorithm. In the current contribution we describe the performance tuning for the code, done by searching a parameter space associated with the code. The results of tuning for different generations of NVIDIA GPUs serve as a basis for analyses and conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Banaś, K., Płaszewski, P., Macioł, P.: Numerical integration on GPUs for higher order finite elements. Comput. Math. Appl. 67(6), 1319–1344 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. Banaś, K., Krużel, F., Bielański, J.: Finite element numerical integration for first order approximations on multi- and many-core architectures. Comput. Methods Appl. Mech. Eng. 305, 827–848 (2016)

    Article  MathSciNet  Google Scholar 

  3. Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. Int. J. Numer. Methods Eng. 85(5), 640–669 (2011)

    Article  MATH  Google Scholar 

  4. Davidson, A., Owens, J.: Toward techniques for auto-tuning GPU algorithms. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 110–119. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28145-7_11

    Chapter  Google Scholar 

  5. Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M.: Generation of large finite-element matrices on multiple graphics processors. Int. J. Numer. Methods Eng. 94(2), 204–220 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Group, K.O.W.: The OpenCL Specification, version 1.1 (2010). http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

  7. Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fifth Edition: A Quantitative Approach, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  8. Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28, 39–55 (2008)

    Article  Google Scholar 

  9. Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 71(1), 80–97 (2013)

    Article  MathSciNet  Google Scholar 

  10. NVIDIA: NVIDIA CUDA C Programming Guide Version 5.0 (2012)

    Google Scholar 

  11. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998, pp. 1–27. IEEE Computer Society, Washington (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krzysztof Banaś .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Banaś, K., Krużel, F., Bielański, J., Chłoń, K. (2018). A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78024-5_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78023-8

  • Online ISBN: 978-3-319-78024-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics