Abstract
Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization strategies: Prefetching, Streamlizing, and Task Division. Adequate experiments have been done to abstract the relationships among different factors and efficiency. Finally, we map the HPL benchmark to testify our strategies and achieve certain speedup.
This work is supported by National High Technology Development 863 Program of China under Grant No.2009AA01Z102 and National Natural Science Foundation of China under Grant No.60873016.
Chapter PDF
References
Ghuloum, A., Sprangle, E., Fang, J., Wu, G., Zhou, X.: Ct: A Flexible Parallel Programming Model for Tera-scale Architectures. Technical report, Intel Research (2007)
Gutowitz. H.: A tutorial introduction to Swarm. Technical report, The Santa Fe Institute (1993)
Monteyne, M.: RapidMind: Multi-Core Develpment Platform, RapidMind Official Page (2007), http://www.rapidmind.net/
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: Past, Present, and Future. Concurrency and Computation: Practice and Experience 15, 803–820 (2003)
Halfhill, T.R.: Parallel Processing With CUDA. Microprocessor Report (January 2008)
Stone, J.: Accelerating Computational Biology by 100x with CUDA. In: NVISION (2008) (presentation)
Hartley, T.D.R., Catalyurek, U., Ruiz, A., Igual, F., Mayo, R., Ujaldon, M.: Biomedical image analysis on a cooperative cluster of gpus and multicores. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 15–25. ACM, New York (2008)
Bond, A.: Havok FX: GPU-accelerated physics for PC games. In: Proceedings of Game Developers Conference 2006 (2006)
Hagen, T.R., Lle, K.-A., Natvig, J.R.: Solving the Euler equations on graphics processing units. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3994, pp. 220–227. Springer, Heidelberg (2006)
Zeller, C.: Cloth simulation on the GPU. In: ACM SIGGRAPH 2005 Conference Abstracts and Applications (2005)
Elsen, E., Houston, M., Vishal, V., Darve, E., Hanrahan, P., Pande, V.S.: N-Body simulation on GPUs. In: Proc. 2006 ACM/IEEE Conf. on Supercomputing, p. 188 (2006)
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comp. Chem. 26, 1781–1802 (2005)
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating molecular modeling applications with graphics processors. J. Comp. Chem. 28, 2618–2640 (2007)
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.W., Liang, Z., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: ACM Computing Frontier Conference (2008)
openVIDIA, http://openvidia.sourceforge.net/
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Super-computing, pp. 1–11. IEEE Press, Los Alamitos (2008)
Fatica, M.: Accelerating Linpack with CUDA on heterogenous clusters. In: GPGPU 2009. ACM, New york (2009)
Castillo, M., Chan, E., Igual, F.D., Mayo, R., Quintanaorti, E.S., Quintana-orti, G., Van De Geijn, R., Van Zee, F.G.: Making Programming Synonymous with Programming for Linear Algebra Libraries, FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20 (April 17, 2008)
Quintana-Orti, G., Igual, F.D., Quintana-Orti, E.S., van de Geijn, R.: Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators. In: PPoPP, pp. 121–129 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, A., Cai, J., Cheng, Y., Ni, X., Tang, Y., Xing, Z. (2009). Performance Optimization Strategies of High Performance Computing on GPU. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-03644-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)