A Beginner’s Guide to Estimating and Improving Performance Portability

  • Henk DreuningEmail author
  • Roel Heirman
  • Ana Lucia Varbanescu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11203)


Given the increasing diversity of multi- and many-core processors, portability is a desirable feature of applications designed and implemented for such platforms. Portability is unanimously seen as a productivity enabler, but it is also considered a major performance blocker. Thus, performance portability has emerged as the property of an application to preserve similar form and similar performance on a set of platforms; a first metric, based on extensive evaluation, has been proposed to quantify performance portability for a given application on a set of given platforms.

In this work, we explore the challenges and limitations of this performance portability metric (PPM) on two levels. We first use 5 OpenACC applications and 3 platforms, and we demonstrate how to compute and interpret PPM in this context. Our results indicate specific challenges in parameter selection and results interpretation. Second, we use controlled experiments to assess the impact of platform-specific optimizations on both performance and performance portability. Our results illustrate, for our 5 OpenACC applications, a clear tension between performance improvement and performance portability improvement.


Performance portability metric Performance optimization OpenACC CPU GPU 



We would like to thank Jason Sewall and John Pennycook for their help in designing our experiments and interpreting the results.


  1. 1.
    Bal, H., et al.: A medium-scale distributed system for computer science research: infrastructure for the long term. Computer 49(5), 54–63 (2016)CrossRefGoogle Scholar
  2. 2.
    Bauer, S.: Accelerator Offloading mit GCC (in German) (2016). Accessed Apr 2018
  3. 3.
    Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)Google Scholar
  4. 4.
    Fabeiro, J.F.: Tools for improving performance portability in heterogeneous environments. Ph.D. thesis, Department of Computer Engineering, University of A Coruña, July 2017Google Scholar
  5. 5.
    Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP), pp. 216–225. IEEE (2011)Google Scholar
  6. 6.
    Intel. Intel Math Kernel Library. Accessed Apr 2018
  7. 7.
    Martineau, M., McIntosh-Smith, S., Gaudin, W.: Assessing the performance portability of modern parallel programming models using TeaLeaf. Concurrency Comput.: Pract. Exp. 29(15), e4117 (2017)CrossRefGoogle Scholar
  8. 8.
    McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995Google Scholar
  9. 9.
    McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Cham (2014). Scholar
  10. 10.
    NVIDIA. cuBLAS. Accessed Apr 2018
  11. 11.
    NVIDIA. CUDA C Programming Guide (2018). Accessed Apr 2018
  12. 12.
    Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. 73(11), 1439–1450 (2013)CrossRefGoogle Scholar
  13. 13.
    Pennycook, S.J., Sewall, J.D., Lee, V.W.: A Metric for Performance Portability. arXiv preprint arXiv:1611.07409 (2016)
  14. 14.
    Pennycook, S.J., Sewall, J.D., Lee, V.W.: A metric for performance portability. CoRR, abs/1611.07409 (2016)Google Scholar
  15. 15.
    Pennycook, S.J., Sewall, J.D., Lee, V.W.: Implications of a metric for performance portability. Future Gener. Comput. Syst. (2017)Google Scholar
  16. 16.
    Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2010) (2010)Google Scholar
  17. 17.
    Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, Washington, DC, USA, pp. 116–125. IEEE Computer Society (2012)Google Scholar
  18. 18.
    Stratton, J.A., Kim, H., Jablin, T.B., Hwu, W.W.: Performance portability in accelerated parallel kernels. Center for Reliable and High-Performance Computing (2013)Google Scholar
  19. 19.
    UK-MAC. TeaLeaf (2017).
  20. 20.
    van der Sanden, J.: Evaluating the performance portability of OpenCL. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011)Google Scholar
  21. 21.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  22. 22.
    Zhang, Y., Sinclair, M., Chien, A.A.: Improving performance portability in OpenCL programs. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 136–150. Springer, Heidelberg (2013). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Henk Dreuning
    • 1
    Email author
  • Roel Heirman
    • 1
  • Ana Lucia Varbanescu
    • 1
  1. 1.University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations