Comparing High Performance Computing Accelerator Programming Models
Accelerator devices are becoming a norm in High Performance Computing (HPC). With more systems opting for heterogeneous architectures, portable programming models like OpenMP and OpenACC are becoming increasingly important. The SPEC ACCEL 1.2 benchmark suite consists of comparable benchmarks in OpenCL, OpenMP 4.5, and OpenACC 2.5 that can be used to evaluate the performance and support for programming models and frameworks on heterogeneous platforms. In this paper we go beneath the normative metric of performance times and look at the individual kernels to study the usage, strengths, and weaknesses of the two prevalent portable heterogeneous programming models, OpenMP and OpenACC. From our analysis we identify that benchmarks like MRI-Q, SP and BT have better performance using OpenACC, while benchmarks like MiniGhost, LBM and LBDC do consistently better with the OpenMP programming model across super-computers like Titan, and Summit. We deep dive into the kernels of select four benchmarks to answer questions like: Where does the benchmark spend most of its cycles? What is the parallelization strategy used? Why is one programming model more performant than the other? By identifying the similarities and differences we want to contrast between the benchmark implementation strategies in the SPEC ACCEL 1.2 benchmarks and provide more insights into the OpenMP and OpenACC programming models.
This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, under contract number DE-AC05-00OR22725. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We would like to thank Dr. Oscar Hernandez from ORNL for his guidance and support during the writing of this manuscript.
- 1.Percival quickstart guide. https://www.olcf.ornl.gov/percival-quickstart-guide/
- 2.Summit: Scale new heights. Discover new solutions. https://www.olcf.ornl.gov/summit/
- 3.Boehm, S., Pophale, S., Vergara Larrea, V.G., Hernandez, O.: Evaluating performance portability of accelerator programming models using SPEC ACCEL 1.2 benchmarks. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 711–723. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_51CrossRefGoogle Scholar
- 4.Juckeland, G., Grund, A., Nagel, W.E.: Performance portable applications for hardware accelerators: lessons learned from SPEC ACCEL. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, pp. 689–698, May 2015. https://doi.org/10.1109/IPDPSW.2015.26
- 5.Juckeland, G., et al.: From describing to prescribing parallelism: translating the SPEC ACCEL OpenACC suite to OpenMP target directives. In: Taufer, M., Mohr, B., Kunkel, J.M. (eds.) ISC High Performance 2016. LNCS, vol. 9945, pp. 470–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46079-6_33CrossRefGoogle Scholar
- 6.NVIDIA: NVIDIA Visual Profiler. https://developer.nvidia.com/nvidia-visual-profiler
- 7.Oak Ridge National Lab: Titan supercomputer. https://www.olcf.ornl.gov/titan/
- 8.Top 500: Top 500: June 2018. https://www.top500.org/lists/2018/06/
- 9.Wienke, S., Terboven, C., Beyer, J.C., Müller, M.S.: A pattern-based comparison of OpenACC and OpenMP for accelerator computing. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 812–823. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_68CrossRefGoogle Scholar