The Design and Implementation of OpenMP 4.5 and OpenACC Backends for the RAJA C++ Performance Portability Layer

  • William Killian
  • Tom Scogland
  • Adam Kunen
  • John Cavazos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10732)


Portability abstraction layers such as RAJA enable users to quickly change how a loop nest is executed with minimal modifications to high-level source code. Directive-based programming models such as OpenMP and OpenACC provide easy-to-use annotations on for-loops and regions which change the execution pattern of user code. Directive-based language backends for RAJA have previously been limited to few options due to multiplicative clauses creating version explosion. In this work, we introduce an updated implementation of two directive-based backends which helps mitigate the aforementioned version explosion problem by leveraging the C++ type system and template meta-programming concepts. We implement partial OpenMP 4.5 and OpenACC backends for the RAJA portability layer which can apply loop transformations and specify how loops should be executed. We evaluate our approach by analyzing compilation and runtime overhead for both backends using PGI 17.7 and IBM clang (OpenMP 4.5) on a collection of computation kernels.


Directive-based programming model Performance portability Abstraction layer Code generation 


  1. 1.
    Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA. In: GPU Computing Gems (2011)Google Scholar
  2. 2.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3175–3272 (2014)CrossRefGoogle Scholar
  3. 3.
    Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. No. LLNL-TR-661403. Lawrence Livermore National Laboratory (LLNL), Livermore, CA (2014)Google Scholar
  4. 4.
    Intel Corporation: Getting Started with Parallel STL, March 2017.
  5. 5.
    ISO/IEC: Programming Languages - Technical Specification for C++ Extensions for Parallelism, May 2015.
  6. 6.
    ISO/IEC: Working Draft, Standard for Programming Language C++, July 2017.
  7. 7.
    Kaiser, H., et al.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models. ACM (2014)Google Scholar
  8. 8.
    Khronos OpenCL Working Group: SYCL Provisional Specification Version 2.2, February 2016.
  9. 9.
    NVIDIA Corporation: Agency 0.1.0 (2016).
  10. 10.
    NVIDIA Corporation: NVIDIA CUDA Compute Unified Device Architecture Programming Guide, June 2017.
  11. 11.
    OpenACC Standard Committee: The OpenACC Application Programming Interface Version 2.5, October 2015.
  12. 12.
    OpenMP Architecture Review Board: OpenMP Application Program Interface Version 3.0, May 2008.
  13. 13.
    OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5, November 2015.
  14. 14.
    Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010). IEEECrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • William Killian
    • 1
    • 2
    • 3
  • Tom Scogland
    • 1
  • Adam Kunen
    • 1
  • John Cavazos
    • 3
  1. 1.Lawrence Livermore National LaboratoryLivermoreUSA
  2. 2.Millersville University of PennsylvaniaMillersvilleUSA
  3. 3.University of DelawareNewarkUSA

Personalised recommendations