Skip to main content

LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation

  • Conference paper
  • First Online:
Supercomputing (RuSCDays 2018)

Abstract

We have achieved a \({\sim }0.3\) GLUps performance on a 4 core CPU for the D3Q19 Lattice Boltzmann method by taking an advanced time-space decomposition approach. The LRnLA algorithm ConeFold was used with a new non-local mirrored vectorization. The roofline model was used for the performance estimation and parameter choice. There are many expansion possibilities, so the developed kernel may become a foundation for more complex LBM variations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Computational resources of Keldysh Institute of Applied Mathematics RAS. www.kiam.ru

  2. Geier, M., Schönherr, M.: Esoteric twist: an efficient in-place streaming algorithmus for the lattice Boltzmann method on massively parallel hardware. Computation 5(2), 19 (2017)

    Article  Google Scholar 

  3. Godenschwager, C., Schornbaum, F., Bauer, M., Köstler, H., Rüde, U.: A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 35. ACM (2013)

    Google Scholar 

  4. Habich, J., Zeiser, T., Hager, G., Wellein, G.: Enabling temporal blocking for a lattice Boltzmann flow solver through multicore-aware wavefront parallelization. In: 21st International Conference on Parallel Computational Fluid Dynamics, pp. 178–182 (2009)

    Google Scholar 

  5. Heuveline, V., Latt, J.: The OpenLB project: an open source and object oriented implementation of lattice Boltzmann methods. Int. J. Mod. Phys. C 18(04), 627–634 (2007)

    Article  Google Scholar 

  6. Ivanov, A., Khilkov, S.: Aiwlib library as the instrument for creating numerical modeling applications. Sci. Vis. 10(1), 110–127 (2018)

    Google Scholar 

  7. Levchenko, V.D.: Asynchronous parallel algorithms as a way to archive effectiveness of computations (in Russian). J. Inf. Tech. Comp. Syst. (1), 68 (2005)

    Google Scholar 

  8. Levchenko, V.D., Perepelkina, A.Y.: Locally recursive non-locally asynchronous algorithms for stencil computation. Lobachevskii J. Math. 39(4), 552–561 (2018)

    Article  MathSciNet  Google Scholar 

  9. Levchenko, V.D., Perepelkina, A.Y., Zakirov, A.V.: DiamondTorre algorithm for high-performance wave modeling. Computation 4(3), 29 (2016)

    Article  Google Scholar 

  10. Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing (1966)

    Google Scholar 

  11. Neumann, P., Bungartz, H.J., Mehl, M., Neckel, T., Weinzierl, T.: A coupled approach for fluid dynamic problems using the PDE framework peano. Commun. Comput. Phys. 12(1), 65–84 (2012)

    Article  Google Scholar 

  12. Nguyen, A., Satish, N., Chhugani, J., Kim, C., Dubey, P.: 3.5-D blocking optimization for stencil computations on modern CPUs and GPUs. In: High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–13. IEEE (2010)

    Google Scholar 

  13. Perepelkina, A.Y., Levchenko, V.D., Goryachev, I.A.: Implementation of the kinetic plasma code with locally recursive non-locally asynchronous algorithms. J. Phys. Conf. Ser. 510, 012042 (2014)

    Article  Google Scholar 

  14. Perepelkina, A.: 3D3V kinetic code for simulation of magnetized plasma (in Russian). Ph.D. thesis, Keldysh Institute of Applied Mathematics RAS, Moscow (2015)

    Google Scholar 

  15. Riesinger, C., Bakhtiari, A., Schreiber, M., Neumann, P., Bungartz, H.J.: A holistic scalable implementation approach of the lattice Boltzmann method for CPU/GPU heterogeneous clusters. Computation 5(4), 48 (2017)

    Article  Google Scholar 

  16. Shimokawabe, T., Endo, T., Onodera, N., Aoki, T.: A stencil framework to realize large-scale computations beyond device memory capacity on GPU supercomputers. In: Cluster Computing (CLUSTER), pp. 525–529. IEEE (2017)

    Google Scholar 

  17. Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)

    MATH  Google Scholar 

  18. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  19. Wittmann, M.: Hardware-effiziente, hochparallele Implementierungen von Lattice-Boltzmann-Verfahren für komplexe Geometrien (in German). Ph.D. thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg (2016)

    Google Scholar 

  20. Zakirov, A.V., Levchenko, V.D.: The code for effective 3D modeling of electormagnetic wavesevolution in actual electrodynamics problems. Keldysh Institute Preprints (28) (2009)

    Google Scholar 

Download references

Acknowledgement

The work is partially supported by the Russian Science Foundation (project #18-71-10004).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anastasia Perepelkina .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Perepelkina, A., Levchenko, V. (2019). LRnLA Algorithm ConeFold with Non-local Vectorization for LBM Implementation. In: Voevodin, V., Sobolev, S. (eds) Supercomputing. RuSCDays 2018. Communications in Computer and Information Science, vol 965. Springer, Cham. https://doi.org/10.1007/978-3-030-05807-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05807-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05806-7

  • Online ISBN: 978-3-030-05807-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics