Abstract
With the appearance of the heterogeneous platform OpenPower, many-core accelerator devices have been coupled with Power host processors for the first time. Towards utilizing their full potential, it is worth investigating performance portable algorithms that allow to choose the best-fitting hardware for each domain-specific compute task. Suiting even the high level of parallelism on modern GPGPUs, our presented approach relies heavily on abstract meta-programming techniques, which are essential to focus on fine-grained tuning rather than code porting. With this in mind, the CUDA-based open-source plasma simulation code PIConGPU is currently being abstracted to support the heterogeneous OpenPower platform using our fast porting interface cupla, which wraps the abstract parallel C++11 kernel acceleration library Alpaka.
We demonstrate how PIConGPU can benefit from the tunable kernel execution strategies of the Alpaka library, achieving portability and performance with single-source kernels on conventional CPUs, Power8 CPUs and NVIDIA GPUs.
This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 654220.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
--use_fast_math --ftz=false -g0 -O3 -m64.
- 2.
-g0 -O3 -m64 -funroll-loops -march=native --param max-unroll-times=512 -ffast-math.
References
AMD: AMD Opteron 6200 Series Processor Quick Reference Guide. https://www.amd.com/Documents/Opteron_6000_QRG.pdf. Accessed 11 Apr 2016
Burau, H., Widera, R., Hönig, W., Juckeland, G., Debus, A., Kluge, T., Schramm, U., Cowan, T.E., Sauerbrey, R., Bussmann, M.: PIConGPU: a fully relativistic particle-incell code for a GPU cluster. IEEE Trans. Plasma Sci. 38(10), 2831–2839 (2010)
Bussmann, M., Burau, H., Cowan, T.E., Debus, A., Huebl, A., Juckeland, G., Kluge, T., Nagel, W.E., Pausch, R., Schmitt, F., Schramm, U., Schuchart, J., Widera, R.: Radiative signatures of the relativistic Kelvin-Helmholtz instability. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 5. ACM (2013). doi:10.1145/2503210.2504564
Chung, H.-K., Chen, M.H., Lee, R.W.: Extension of atomic configuration sets of the Non-LTE model in the application to the K\(\alpha \) diagnostics of hot dense matter. High Energy Density Phys. 3(1), 57–64 (2007)
Carter Edwards, H., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)
Fluhr, E.J., Friedrich, J., Dreps, D., Zyuban, V., Still, G., Gonzalez, C., Hall, A., Hogenmiller, D., Malgioglio, F., Nett, R., Paredes, J., Pille, J., Plass, D., Puri, R., Restle, P., Shan, D., Stawiasz, K., Deniz, Z.T., DieterWendel, M.Z.: 5.1 POWER8 TM: a 12-core server-class processor in 22nm SOI with 7.6 Tb/s off-chip bandwidth. In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 96–97. IEEE (2014)
Foley, D.: DataNVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data. https://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/. Accessed 13 Jun 2016
Hockney, R.W., Eastwood, J.W.: Computer Simulation Using Particles. CRC Press, Boca Raton (1988). ISBN:0-85274-392-0
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Lawrence Livermore National Laboratory, Livermore, USA, LLNL-TR-661403 (2014)
Intel: Intel Xeon Processor E5-2698 v3 Specification. http://ark.intel.com/de/products/81060/Intel-Xeon-Processor-E5-2698-v3-40M-Cache-2_30-GHz. Accessed 11 Apr 2016
de Oliveira, M.F.: NVIDIA on IBM POWER8: Technical overview, software installation, and application development (2015)
NVIDIA: Tesla K80 GPU Accelerator Board Specification. http://images.nvidia.com/content/pdf/kepler/Tesla-K80-BoardSpec-07317-001-v05.pdf. Accessed 11 Apr 2016
Oak Ridge National Laboratory: Summit. Scale new heights. Discover new solutions. Oak Ridge National Laboratory’s next High Performance Supercomputer. https://www.olcf.ornl.gov/summit/. Accessed 10 Apr 2016
Kowalke, O.: Boost.Fiber. https://github.com/olk/boost-fiber. Accessed 12 Apr 2016
OpenMP: OpenMP application program interface version 4.0 (2013)
Widera, R.: cupla: C++ User interface for the Platform independent Library Alpaka. https://github.com/ComputationalRadiationPhysics/cupla. Accessed 14 Mar 2016
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)
Widera, R., Worpitz, B., Zenker, E., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: PI- ConGPU, Alpaka, cupla software bundle for IWOPH 2016 submission, May 2016. doi:10.5281/zenodo.53761
Zeil, K., Metzkes, J., Kluge, T., Bussmann, M., Cowan, T.E., Kraft, S.D., Sauerbrey, R., Schramm, U.: Direct observation of prompt pre-thermal laser ion sheath acceleration. Nat. Commun. 3, 874 (2012)
Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knüpfer, A., Nagel, W.E., Bussmann, M.: Alpaka - an abstraction library for parallel kernel acceleration. In: International Parallel and Distributed Processing Symposium Workshops. IEEE (2016). doi:10.1109/IPDPSW.2016.50
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zenker, E. et al. (2016). Performance-Portable Many-Core Plasma Simulations: Porting PIConGPU to OpenPower and Beyond. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)