Abstract
General Purpose Graphics Computing Units can be effectively used for enhancing the performance of many contemporary scientific applications. However, programming GPUs using machine-specific notations like CUDA or OpenCL can be complex and time consuming. In addition, the resulting programs are typically fine-tuned for a particular target device. A promising alternative is to program in a conventional and machine-independent notation extended with directives and use compilers to generate GPU code automatically. These compilers enable portability and increase programmer productivity and, if effective, would not impose much penalty on performance.
This paper evaluates two such compilers, PGI and Cray. We first identify a collection of standard transformations that these compilers can apply. Then, we propose a sequence of manual transformations that programmers can apply to enable the generation of efficient GPU kernels. Lastly, using the Rodinia Benchmark suite, we compare the performance of the code generated by the PGI and Cray compilers with that of code written in CUDA. Our evaluation shows that the code produced by the PGI and Cray compilers can perform well. For 6 of the 15 benchmarks that we evaluated, the compiler generated code achieved over 85 % of the performance of a hand-tuned CUDA version.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In the PGI version of CFD Solver, we also had to separate the individual float values included in a structure, but this was most probably due to a bug.
References
Amini, M., Coelho, F., Irigoin, F., Keryell, R.: Static compilation analysis for host-accelerator communication optimization. In: Rajopadhye, S., Mills Strout, M. (eds.) LCPC 2011. LNCS, vol. 7146, pp. 237–251. Springer, Heidelberg (2013)
Bordawekar, R., Bondhugula, U., Rao, R.: Can CPUs match GPUs on performance with productivity?: Experiences with optimizing a flop-intensive application on CPUs and GPU. Technical report RC25033, IBM, August 2010
Boyer, M., Tarjan, D., Acton, S.T., Skadron, K.: Accelerating leukocyte tracking using cuda: a case study in leveraging manycore coprocessors. In: Proceedings of IPDPS, pp. 1–12 (2009)
CAPS Enterprise: HMPP workbench (2011). http://www.caps-entreprise.com/technology/hmpp/
CAPS Enterprise and Cray Inc. and NVIDIA and the Portland Group: The openacc application programming interface, v1.0, November 2011. http://www.openacc-standard.org/
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC 2009. pp. 44–54, October 2009
Che, S., et al.: A characterization of the rodinia benchmark suite with comparison to contemporary CMP workloads. In: Proceedings of IISWC, pp. 1–11 (2010)
Cloutier, B., Muite, B.K., Rigge, P.: A comparison of CPU and GPU performance for fourier pseudospectral simulations of the navier-stokes, cubic nonlinear schrodinger and sine gordon equations. ArXiv e-prints, June 2012
CRAY: Cray Compiler Environment (2011). http://docs.cray.com/books/S-2179-52/html-S-2179-52/index.html
Grauer Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Proceedings of InPar, pp. 1–10 (2012)
Hacker, H., Trinitis, C., Weidendorfer, J., Brehm, M.: Considering GPGPU for HPC Centers: Is It Worth the effort? In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore-Challenge. LNCS, vol. 6310, pp. 118–130. Springer, Heidelberg (2010)
Han, T., Abdelrahman, T.: HiCUDA: high-level GPGPU programming. IEEE Trans. Parallel Distrib. Syst. 22(1), 78–90 (2011)
Henderson, T., Middlecoff, J., Rosinski, J., Govett, M., Madden, P.: Experience applying fortran GPU compilers to numerical weather prediction. In: Proceedings of SAAHPC, pp. 34–41, July 2011
Hernandez, O., Ding, W., Chapman, B., Kartsaklis, C., Sankaran, R., Graham, R.: Experiences with high-level programming directives for porting applications to GPUs. In: Keller, R., Kramer, D., Weiss, J.-P. (eds.) Facing the Multicore - Challenge II. LNCS, vol. 7174, pp. 96–107. Springer, Heidelberg (2012)
Enos, J., et al.: Quantifying the impact of GPUs on performance and energy efficiency in HPC clusters. In: Internatioanl Green Computing Conference, pp. 317–324, August 2010
Jablin, T.B., et al.: Automatic CPU-GPU communication management and optimization. SIGPLAN Not. 47(6), 142–151 (2011)
Jin, H., Kellogg, M., Mehrotra, P.: Using compiler directives for accelerating CFD applications on GPUs. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 154–168. Springer, Heidelberg (2012)
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2002)
Khronos Group: Opencl - the open standard for parallel programming of heterogeneous systems (2011). http://www.khronos.org/opencl
Lee, S., Eigenmann, R.: OpenMPC: extended OpenMP programming and tuning for GPUs. In: Proceedings of SC 2010 (2010)
Lee, S., Min, S.J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. In: Proceedings of PPoPP 2009 (2010)
Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: Proceedings of SC2012. IEEE Press, Salt Lake City (2012)
Leung, A., Vasilache, N., Meister, B., Baskaran, M., Wohlford, D., Bastoul, C., Lethin, R.: A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction. In: Proceedings of GPGPU (2010)
Membarth, R., Hannig, F., Teich, J., Korner, M., Eckert, W.: Frameworks for GPU accelerators: a comprehensive evaluation using 2D/3D image registration. In: Proceedings of SASP, pp. 78–81, June 2011
NVIDIA: Compute Command Line Profiler. NVIDIA Whitepaper
NVIDIA: The Benefits of Multiple CPU Cores in Mobile Devices. NVIDIA Whitepaper. http://www.nvidia.com/content/PDF/tegra_white_papers/Benefits-of-Multi-core-CPUs-in-Mobile-Devices_Ver1.2.pdf
NVIDIA: Bring high-end graphics to handheld devices. NVIDIA White Paper (2011). http://www.nvidia.com/content/PDF/tegra_white_papers/Bringing_High-End_Graphics_to_Handheld_Devices.pdf
NVIDIA Corporation: NVIDIA CUDA programming guide version 4.0 (2011). http://developer.download.nvidia.com
OpenMP: Openmp: Complete specification v4.0 (2013). http://openmp.org/wp/resources/
The Portland Group: PGI compiler reference manual (2011). http://www.pgroup.com/doc/pgiref.pdf
Website, B.W.: (2011). http://www.ncsa.illinois.edu/BlueWaters/
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012)
Acknowledgments
This research is part of the Blue Waters sustained-petascale computing project, which is supported by NSF (award number OCI 07-25070) and the state of Illinois. It was also supported by NSF under Award CNS 1111407 and by grants TIN2007-60625, TIN2010-21291-C02-01 and TIN2013-64957-C2-1-P (Spanish Government and European ERDF), gaZ: T48 research group (Aragon Government and European ESF).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ghike, S., Gran, R., Garzarán, M.J., Padua, D. (2015). Directive-Based Compilers for GPUs. In: Brodman, J., Tu, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2014. Lecture Notes in Computer Science(), vol 8967. Springer, Cham. https://doi.org/10.1007/978-3-319-17473-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-17473-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17472-3
Online ISBN: 978-3-319-17473-0
eBook Packages: Computer ScienceComputer Science (R0)