A Beginner’s Guide to Estimating and Improving Performance Portability
Given the increasing diversity of multi- and many-core processors, portability is a desirable feature of applications designed and implemented for such platforms. Portability is unanimously seen as a productivity enabler, but it is also considered a major performance blocker. Thus, performance portability has emerged as the property of an application to preserve similar form and similar performance on a set of platforms; a first metric, based on extensive evaluation, has been proposed to quantify performance portability for a given application on a set of given platforms.
In this work, we explore the challenges and limitations of this performance portability metric (PPM) on two levels. We first use 5 OpenACC applications and 3 platforms, and we demonstrate how to compute and interpret PPM in this context. Our results indicate specific challenges in parameter selection and results interpretation. Second, we use controlled experiments to assess the impact of platform-specific optimizations on both performance and performance portability. Our results illustrate, for our 5 OpenACC applications, a clear tension between performance improvement and performance portability improvement.
KeywordsPerformance portability metric Performance optimization OpenACC CPU GPU
We would like to thank Jason Sewall and John Pennycook for their help in designing our experiments and interpreting the results.
- 2.Bauer, S.: Accelerator Offloading mit GCC (in German) (2016). https://www.heise.de/developer/artikel/Accelerator-Offloading-mit-GCC-3317330.html?seite=3. Accessed Apr 2018
- 3.Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)Google Scholar
- 4.Fabeiro, J.F.: Tools for improving performance portability in heterogeneous environments. Ph.D. thesis, Department of Computer Engineering, University of A Coruña, July 2017Google Scholar
- 5.Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP), pp. 216–225. IEEE (2011)Google Scholar
- 6.Intel. Intel Math Kernel Library. https://software.intel.com/en-us/mkl. Accessed Apr 2018
- 8.McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995Google Scholar
- 9.McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_4CrossRefGoogle Scholar
- 10.NVIDIA. cuBLAS. https://developer.nvidia.com/cublas. Accessed Apr 2018
- 11.NVIDIA. CUDA C Programming Guide (2018). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Accessed Apr 2018
- 13.Pennycook, S.J., Sewall, J.D., Lee, V.W.: A Metric for Performance Portability. arXiv preprint arXiv:1611.07409 (2016)
- 14.Pennycook, S.J., Sewall, J.D., Lee, V.W.: A metric for performance portability. CoRR, abs/1611.07409 (2016)Google Scholar
- 15.Pennycook, S.J., Sewall, J.D., Lee, V.W.: Implications of a metric for performance portability. Future Gener. Comput. Syst. (2017)Google Scholar
- 16.Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2010) (2010)Google Scholar
- 17.Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, Washington, DC, USA, pp. 116–125. IEEE Computer Society (2012)Google Scholar
- 18.Stratton, J.A., Kim, H., Jablin, T.B., Hwu, W.W.: Performance portability in accelerated parallel kernels. Center for Reliable and High-Performance Computing (2013)Google Scholar
- 19.UK-MAC. TeaLeaf (2017). http://uk-mac.github.io/TeaLeaf/
- 20.van der Sanden, J.: Evaluating the performance portability of OpenCL. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011)Google Scholar