LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation

  • Anastasia PerepelkinaEmail author
  • Vadim Levchenko
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 965)


We have achieved a \({\sim }0.3\) GLUps performance on a 4 core CPU for the D3Q19 Lattice Boltzmann method by taking an advanced time-space decomposition approach. The LRnLA algorithm ConeFold was used with a new non-local mirrored vectorization. The roofline model was used for the performance estimation and parameter choice. There are many expansion possibilities, so the developed kernel may become a foundation for more complex LBM variations.


Lattice Boltzmann method LRnLA algorithms Parallel computation 



The work is partially supported by the Russian Science Foundation (project #18-71-10004).


  1. 1.
    Computational resources of Keldysh Institute of Applied Mathematics RAS.
  2. 2.
    Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017)CrossRefGoogle Scholar
  3. 3.
    Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35. ACM (2013)Google Scholar
  4. 4.
    Habich, J., Zeiser, T., Hager, G., Wellein, G.: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In: 21st International Conference on Parallel Computational Fluid Dynamics, pp. 178–182 (2009)Google Scholar
  5. 5.
    Heuveline, V., Latt, J.: The OpenLB project: an open source and object oriented implementation of lattice Boltzmann methods. Int. J. Mod. Phys. C 18(04), 627–634 (2007)CrossRefGoogle Scholar
  6. 6.
    Ivanov, A., Khilkov, S.: Aiwlib library as the instrument for creating numerical modeling applications. Sci. Vis. 10(1), 110–127 (2018)Google Scholar
  7. 7.
    Levchenko, V.D.: Asynchronous parallel algorithms as a way to archive effectiveness of computations (in Russian). J. Inf. Tech. Comp. Syst. (1), 68 (2005)Google Scholar
  8. 8.
    Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Levchenko, V.D., Perepelkina, A.Y., Zakirov, A.V.: DiamondTorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016)CrossRefGoogle Scholar
  10. 10.
    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)Google Scholar
  11. 11.
    Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012)CrossRefGoogle Scholar
  12. 12.
    Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010)Google Scholar
  13. 13.
    Perepelkina, A.Y., Levchenko, V.D., Goryachev, I.A.: Implementation of the kinetic plasma code with locally recursive non-locally asynchronous algorithms. J. Phys. Conf. Ser. 510, 012042 (2014)CrossRefGoogle Scholar
  14. 14.
    Perepelkina, A.: 3D3V kinetic code for simulation of magnetized plasma (in Russian). Ph.D. thesis, Keldysh Institute of Applied Mathematics RAS, Moscow (2015)Google Scholar
  15. 15.
    Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017)CrossRefGoogle Scholar
  16. 16.
    Shimokawabe, T., Endo, T., Onodera, N., Aoki, T.: A stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers. In: Cluster Computing (CLUSTER), pp. 525–529. IEEE (2017)Google Scholar
  17. 17.
    Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)zbMATHGoogle Scholar
  18. 18.
    Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)CrossRefGoogle Scholar
  19. 19.
    Wittmann, M.: Hardware-effiziente, hochparallele Implementierungen von Lattice-Boltzmann-Verfahren für komplexe Geometrien (in German). Ph.D. thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2016)Google Scholar
  20. 20.
    Zakirov, A.V., Levchenko, V.D.: The code for effective 3D modeling of electormagnetic wavesevolution in actual electrodynamics problems. Keldysh Institute Preprints (28) (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Keldysh Institute of Applied Mathematics RASMoscowRussia

Personalised recommendations