Abstract
We have achieved a \({\sim }0.3\) GLUps performance on a 4 core CPU for the D3Q19 Lattice Boltzmann method by taking an advanced time-space decomposition approach. The LRnLA algorithm ConeFold was used with a new non-local mirrored vectorization. The roofline model was used for the performance estimation and parameter choice. There are many expansion possibilities, so the developed kernel may become a foundation for more complex LBM variations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Computational resources of Keldysh Institute of Applied Mathematics RAS. www.kiam.ru
Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017)
Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35. ACM (2013)
Habich, J., Zeiser, T., Hager, G., Wellein, G.: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In: 21st International Conference on Parallel Computational Fluid Dynamics, pp. 178–182 (2009)
Heuveline, V., Latt, J.: The OpenLB project: an open source and object oriented implementation of lattice Boltzmann methods. Int. J. Mod. Phys. C 18(04), 627–634 (2007)
Ivanov, A., Khilkov, S.: Aiwlib library as the instrument for creating numerical modeling applications. Sci. Vis. 10(1), 110–127 (2018)
Levchenko, V.D.: Asynchronous parallel algorithms as a way to archive effectiveness of computations (in Russian). J. Inf. Tech. Comp. Syst. (1), 68 (2005)
Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018)
Levchenko, V.D., Perepelkina, A.Y., Zakirov, A.V.: DiamondTorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016)
Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)
Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012)
Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010)
Perepelkina, A.Y., Levchenko, V.D., Goryachev, I.A.: Implementation of the kinetic plasma code with locally recursive non-locally asynchronous algorithms. J. Phys. Conf. Ser. 510, 012042 (2014)
Perepelkina, A.: 3D3V kinetic code for simulation of magnetized plasma (in Russian). Ph.D. thesis, Keldysh Institute of Applied Mathematics RAS, Moscow (2015)
Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017)
Shimokawabe, T., Endo, T., Onodera, N., Aoki, T.: A stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers. In: Cluster Computing (CLUSTER), pp. 525–529. IEEE (2017)
Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Wittmann, M.: Hardware-effiziente, hochparallele Implementierungen von Lattice-Boltzmann-Verfahren für komplexe Geometrien (in German). Ph.D. thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2016)
Zakirov, A.V., Levchenko, V.D.: The code for effective 3D modeling of electormagnetic wavesevolution in actual electrodynamics problems. Keldysh Institute Preprints (28) (2009)
Acknowledgement
The work is partially supported by the Russian Science Foundation (project #18-71-10004).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Perepelkina, A., Levchenko, V. (2019). LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2018. Communications in Computer and Information Science, vol 965. Springer, Cham. https://doi.org/10.1007/978-3-030-05807-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-05807-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05806-7
Online ISBN: 978-3-030-05807-4
eBook Packages: Computer ScienceComputer Science (R0)