Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

  • Erik Zenker
  • René Widera
  • Axel HueblEmail author
  • Guido Juckeland
  • Andreas Knüpfer
  • Wolfgang E. Nagel
  • Michael BussmannEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)


With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka.

We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs.


OpenPower Heterogeneous computing HPC C++11 CUDA OpenMP Particle-in-cell Platform portability Performance portability 


  1. 1.
    AMD: AMD Opteron 6200 Series Processor Quick Reference Guide. Accessed 11 Apr 2016
  2. 2.
    Burau, H., Widera, R., Hönig, W., Juckeland, G., Debus, A., Kluge, T., Schramm, U., Cowan, T.E., Sauerbrey, R., Bussmann, M.: PIConGPU: a fully relativistic particle-incell code for a GPU cluster. IEEE Trans. Plasma Sci. 38(10), 2831–2839 (2010)CrossRefGoogle Scholar
  3. 3.
    Bussmann, M., Burau, H., Cowan, T.E., Debus, A., Huebl, A., Juckeland, G., Kluge, T., Nagel, W.E., Pausch, R., Schmitt, F., Schramm, U., Schuchart, J., Widera, R.: Radiative signatures of the relativistic Kelvin-Helmholtz instability. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 5. ACM (2013). doi: 10.1145/2503210.2504564
  4. 4.
    Chung, H.-K., Chen, M.H., Lee, R.W.: Extension of atomic configuration sets of the Non-LTE model in the application to the K\(\alpha \) diagnostics of hot dense matter. High Energy Density Phys. 3(1), 57–64 (2007)CrossRefGoogle Scholar
  5. 5.
    Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)CrossRefGoogle Scholar
  6. 6.
    Fluhr, E.J., Friedrich, J., Dreps, D., Zyuban, V., Still, G., Gonzalez, C., Hall, A., Hogenmiller, D., Malgioglio, F., Nett, R., Paredes, J., Pille, J., Plass, D., Puri, R., Restle, P., Shan, D., Stawiasz, K., Deniz, Z.T., DieterWendel, M.Z.: 5.1 POWER8 TM: a 12-core server-class processor in 22nm SOI with 7.6 Tb/s off-chip bandwidth. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 96–97. IEEE (2014)Google Scholar
  7. 7.
    Foley, D.: DataNVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data. Accessed 13 Jun 2016
  8. 8.
    Hockney, R.W., Eastwood, J.W.: Computer Simulation Using Particles. CRC Press, Boca Raton (1988). ISBN:0-85274-392-0CrossRefzbMATHGoogle Scholar
  9. 9.
    Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Lawrence Livermore National Laboratory, Livermore, USA, LLNL-TR-661403 (2014)Google Scholar
  10. 10.
    Intel: Intel Xeon Processor E5-2698 v3 Specification. Accessed 11 Apr 2016
  11. 11.
    de Oliveira, M.F.: NVIDIA on IBM POWER8: Technical overview, software installation, and application development (2015)Google Scholar
  12. 12.
    NVIDIA: Tesla K80 GPU Accelerator Board Specification. Accessed 11 Apr 2016
  13. 13.
    Oak Ridge National Laboratory: Summit. Scale new heights. Discover new solutions. Oak Ridge National Laboratory’s next High Performance Supercomputer. Accessed 10 Apr 2016
  14. 14.
    Kowalke, O.: Boost.Fiber. Accessed 12 Apr 2016
  15. 15.
    OpenMP: OpenMP application program interface version 4.0 (2013)Google Scholar
  16. 16.
    Widera, R.: cupla: C++ User interface for the Platform independent Library Alpaka. Accessed 14 Mar 2016
  17. 17.
    Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)CrossRefGoogle Scholar
  18. 18.
    Widera, R., Worpitz, B., Zenker, E., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: PI- ConGPU, Alpaka, cupla software bundle for IWOPH 2016 submission, May 2016. doi: 10.5281/zenodo.53761
  19. 19.
    Zeil, K., Metzkes, J., Kluge, T., Bussmann, M., Cowan, T.E., Kraft, S.D., Sauerbrey, R., Schramm, U.: Direct observation of prompt pre-thermal laser ion sheath acceleration. Nat. Commun. 3, 874 (2012)CrossRefGoogle Scholar
  20. 20.
    Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: Alpaka - an abstraction library for parallel kernel acceleration. In: International Parallel and Distributed Processing Symposium Workshops. IEEE (2016). doi: 10.1109/IPDPSW.2016.50

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Erik Zenker
    • 1
    • 2
  • René Widera
    • 1
  • Axel Huebl
    • 1
    • 2
    Email author
  • Guido Juckeland
    • 1
  • Andreas Knüpfer
    • 2
  • Wolfgang E. Nagel
    • 2
  • Michael Bussmann
    • 1
    Email author
  1. 1.Helmholtz-Zentrum Dresden–RossendorfDresdenGermany
  2. 2.Technische Universität DresdenDresdenGermany

Personalised recommendations