Abstract
The Particle-In-Cell (PIC) method is effectively used in many scientific simulation codes. In order to optimize the performance of the PIC approach, data locality is required. This relies on efficient sorting algorithms. We present a bucket sort algorithm with small memory footprint for the PIC method targeting Graphics Processing Units (GPUs). Our sorting algorithm shows an increased performance with the amount of storage provided and with the orderliness of the particles. For our application where particles are presorted it performs better and requires less memory than other sorting algorithms in the literature. The overall PIC algorithm performs at its best if the sorting is applied.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
http://devblogs.nvidia.com/parallelforall/gpu-pro-tip-fast-histograms-using- shared-atomics-maxwell/
Burnetas, A., Solow, D., Agarwal, R.: An analysis and implementation of an efficient in-place bucket sort. Acta Informatica 34, 687–700 (1997)
Chen, G., Chaćon, L., Barnes, D.C.: An efficient mixed-precision, hybrid CPU-GPU implementation of a nonlinearly implicit one-dimensional particle-in-cell algorithm. J. Comput. Phys. 231, 5374–5388 (2012)
Decyk, V.K., Singh, T.V.: Particle-in-cell algorithms for emerging computer architectures. Comput. Phys. Commun. 185(3), 708–719 (2014)
Hockney, R.W., Eastwood, J.W.: Computer Simulation Using Particles. Hilger, Bristol (1988)
Jolliet, S., Bottino, A., Angelino, P., Hatzky, R., Tran, T.M., Mcmillan, B.F., Sauter, O., Appert, K., Idomura, Y., Villard, L.: A global collisionless PIC code in magnetic coordinates. Comput. Phys. Commun. 177, 409–425 (2007)
Joseph, R.G., Ravunnitkutty, G., Ranka, S., D’Azevedo, E., Klasky, S.: Efficient GPU implementation for particle in cell algorithm. In: 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Anchorage (Alaska), May 2011
Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(2), 245–272 (2011)
Mertmann, P., Eremin, D., Mussenbrock, T., Brinkmann, R.P., Awakowicz, P.: Fine-sorting one-dimensional particle-in-cell algorithm with Monte-Carlo collisions on a graphics processing unit. Comput. Phys. Commun. 182, 2161–2167 (2011)
Rozen, T., Boryczko, K., Alda, W.: GPU bucket sort algorithm with applications to nearest-neighbour search. J. WSCG 16, 161–167 (2008)
Satish, N., Kim, C., Chhugani, J., Nguyen, A.D., Lee, V.W., Kim, D., Dubey, P.: Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: SIGMOD 2010, Indinapolis (Indiana), June 2010
Sintorn, E., Assarsson, U.: Fast parallel GPU-sorting using a hybrid algorithm. J. Parallel Distrib. Comput. 68, 1381–1388 (2008)
Stantchev, G., Dorland, W., Gumerov, N.: Fast parallel particle-to-grid interpolation for plasma PIC simulations on the GPU. J. Parallel Distrib. Comput. 68, 1339–1349 (2008)
Acknowledgments
The authors wish to thank Peter Messmer and Jakob Progsch from NVIDIA for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jocksch, A., Hariri, F., Tran, TM., Brunner, S., Gheller, C., Villard, L. (2016). A Bucket Sort Algorithm for the Particle-In-Cell Method on Manycore Architectures. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K., Kitowski, J., Wiatr, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2015. Lecture Notes in Computer Science(), vol 9573. Springer, Cham. https://doi.org/10.1007/978-3-319-32149-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-32149-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32148-6
Online ISBN: 978-3-319-32149-3
eBook Packages: Computer ScienceComputer Science (R0)