Abstract
We consider the performance of a selected computational kernel from a scientific code on different generations of NVIDIA GPUs. The code that we use for tests is an OpenCL implementation of finite element numerical integration algorithm. In the current contribution we describe the performance tuning for the code, done by searching a parameter space associated with the code. The results of tuning for different generations of NVIDIA GPUs serve as a basis for analyses and conclusions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banaś, K., Płaszewski, P., Macioł, P.: Numerical integration on GPUs for higher order finite elements. Comput. Math. Appl. 67(6), 1319–1344 (2014)
Banaś, K., Krużel, F., Bielański, J.: Finite element numerical integration for first order approximations on multi- and many-core architectures. Comput. Methods Appl. Mech. Eng. 305, 827–848 (2016)
Cecka, C., Lew, A.J., Darve, E.: Assembly of finite element methods on graphics processors. Int. J. Numer. Methods Eng. 85(5), 640–669 (2011)
Davidson, A., Owens, J.: Toward techniques for auto-tuning GPU algorithms. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 110–119. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28145-7_11
Dziekonski, A., Sypek, P., Lamecki, A., Mrozowski, M.: Generation of large finite-element matrices on multiple graphics processors. Int. J. Numer. Methods Eng. 94(2), 204–220 (2013)
Group, K.O.W.: The OpenCL Specification, version 1.1 (2010). http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf
Hennessy, J.L., Patterson, D.A.: Computer Architecture, Fifth Edition: A Quantitative Approach, 5th edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: a unified graphics and computing architecture. IEEE Micro 28, 39–55 (2008)
Markall, G.R., Slemmer, A., Ham, D.A., Kelly, P.H.J., Cantwell, C.D., Sherwin, S.J.: Finite element assembly strategies on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 71(1), 80–97 (2013)
NVIDIA: NVIDIA CUDA C Programming Guide Version 5.0 (2012)
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, SC 1998, pp. 1–27. IEEE Computer Society, Washington (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Banaś, K., Krużel, F., Bielański, J., Chłoń, K. (2018). A Comparison of Performance Tuning Process for Different Generations of NVIDIA GPUs and an Example Scientific Computing Algorithm. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-78024-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-78023-8
Online ISBN: 978-3-319-78024-5
eBook Packages: Computer ScienceComputer Science (R0)