Skip to main content

Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Abstract

With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka.

We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs.

This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 654220.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    --use_fast_math --ftz=false -g0 -O3 -m64.

  2. 2.

    -g0 -O3 -m64 -funroll-loops -march=native --param max-unroll-times=512 -ffast-math.

References

  1. AMD: AMD Opteron 6200 Series Processor Quick Reference Guide. https://www.amd.com/Documents/Opteron_6000_QRG.pdf. Accessed 11 Apr 2016

  2. Burau, H., Widera, R., Hönig, W., Juckeland, G., Debus, A., Kluge, T., Schramm, U., Cowan, T.E., Sauerbrey, R., Bussmann, M.: PIConGPU: a fully relativistic particle-incell code for a GPU cluster. IEEE Trans. Plasma Sci. 38(10), 2831–2839 (2010)

    Article  Google Scholar 

  3. Bussmann, M., Burau, H., Cowan, T.E., Debus, A., Huebl, A., Juckeland, G., Kluge, T., Nagel, W.E., Pausch, R., Schmitt, F., Schramm, U., Schuchart, J., Widera, R.: Radiative signatures of the relativistic Kelvin-Helmholtz instability. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 5. ACM (2013). doi:10.1145/2503210.2504564

  4. Chung, H.-K., Chen, M.H., Lee, R.W.: Extension of atomic configuration sets of the Non-LTE model in the application to the K\(\alpha \) diagnostics of hot dense matter. High Energy Density Phys. 3(1), 57–64 (2007)

    Article  Google Scholar 

  5. Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)

    Article  Google Scholar 

  6. Fluhr, E.J., Friedrich, J., Dreps, D., Zyuban, V., Still, G., Gonzalez, C., Hall, A., Hogenmiller, D., Malgioglio, F., Nett, R., Paredes, J., Pille, J., Plass, D., Puri, R., Restle, P., Shan, D., Stawiasz, K., Deniz, Z.T., DieterWendel, M.Z.: 5.1 POWER8 TM: a 12-core server-class processor in 22nm SOI with 7.6 Tb/s off-chip bandwidth. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 96–97. IEEE (2014)

    Google Scholar 

  7. Foley, D.: DataNVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data. https://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/. Accessed 13 Jun 2016

  8. Hockney, R.W., Eastwood, J.W.: Computer Simulation Using Particles. CRC Press, Boca Raton (1988). ISBN:0-85274-392-0

    Book  MATH  Google Scholar 

  9. Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Lawrence Livermore National Laboratory, Livermore, USA, LLNL-TR-661403 (2014)

    Google Scholar 

  10. Intel: Intel Xeon Processor E5-2698 v3 Specification. http://ark.intel.com/de/products/81060/Intel-Xeon-Processor-E5-2698-v3-40M-Cache-2_30-GHz. Accessed 11 Apr 2016

  11. de Oliveira, M.F.: NVIDIA on IBM POWER8: Technical overview, software installation, and application development (2015)

    Google Scholar 

  12. NVIDIA: Tesla K80 GPU Accelerator Board Specification. http://images.nvidia.com/content/pdf/kepler/Tesla-K80-BoardSpec-07317-001-v05.pdf. Accessed 11 Apr 2016

  13. Oak Ridge National Laboratory: Summit. Scale new heights. Discover new solutions. Oak Ridge National Laboratory’s next High Performance Supercomputer. https://www.olcf.ornl.gov/summit/. Accessed 10 Apr 2016

  14. Kowalke, O.: Boost.Fiber. https://github.com/olk/boost-fiber. Accessed 12 Apr 2016

  15. OpenMP: OpenMP application program interface version 4.0 (2013)

    Google Scholar 

  16. Widera, R.: cupla: C++ User interface for the Platform independent Library Alpaka. https://github.com/ComputationalRadiationPhysics/cupla. Accessed 14 Mar 2016

  17. Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)

    Article  Google Scholar 

  18. Widera, R., Worpitz, B., Zenker, E., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: PI- ConGPU, Alpaka, cupla software bundle for IWOPH 2016 submission, May 2016. doi:10.5281/zenodo.53761

  19. Zeil, K., Metzkes, J., Kluge, T., Bussmann, M., Cowan, T.E., Kraft, S.D., Sauerbrey, R., Schramm, U.: Direct observation of prompt pre-thermal laser ion sheath acceleration. Nat. Commun. 3, 874 (2012)

    Article  Google Scholar 

  20. Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: Alpaka - an abstraction library for parallel kernel acceleration. In: International Parallel and Distributed Processing Symposium Workshops. IEEE (2016). doi:10.1109/IPDPSW.2016.50

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Axel Huebl or Michael Bussmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Zenker, E. et al. (2016). Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics