Skip to main content

Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6449))

Abstract

In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal is to facilitate optimisation of regular data-parallel applications on GPUs. As an example, we finally describe our CUDA implementations of LBM flow solvers on which our model was able to estimate performance with less than 5% relative error.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dongarra, J., Moore, S., Peterson, G., Tomov, S., Allred, J., Natoli, V., Richie, D.: Exploring new architectures in accelerating CFD for Air Force applications. In: Proceedings of HPCMP Users Group Conference, Citeseer, pp. 14–17 (2008)

    Google Scholar 

  2. nVidia: Compute Unified Device Architecture Programming Guide version 2.3.1 (August 2009)

    Google Scholar 

  3. McNamara, G.R., Zanetti, G.: Use of the Boltzmann Equation to Simulate Lattice-Gas Automata. Phys. Rev. Lett. 61, 2332–2335 (1988)

    Article  Google Scholar 

  4. Qian, Y.H., d’Humières, D., Lallemand, P.: Lattice BGK models for Navier-Stokes equation. Europhys. Lett. 17(6), 479–484 (1992)

    Article  MATH  Google Scholar 

  5. d’Humières, D., Ginzburg, I., Krafczyk, M., Lallemand, P., Luo, L.: Multiple-relaxation-time lattice Boltzmann models in three dimensions. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 437–451 (2002)

    Google Scholar 

  6. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 73–82. ACM, New York (2008)

    Google Scholar 

  7. Pohl, T., Kowarschik, M., Wilke, J., Iglberger, K., Rüde, U.: Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes. Parallel Processing Letters 13(4), 549–560 (2003)

    Article  MathSciNet  Google Scholar 

  8. Kuznik, F., Obrecht, C., Rusaouën, G., Roux, J.J.: LBM Based Flow Simulation Using GPU Computing Processor. Computers and Mathematics with Applications (27) (June 2009)

    Google Scholar 

  9. Tölke, J., Krafczyk, M.: TeraFLOP computing on a desktop PC with GPUs for 3D CFD. International Journal of Computational Fluid Dynamics 22(7), 443–456 (2008)

    Article  MATH  Google Scholar 

  10. Obrecht, C., Kuznik, F., Tourancheau, B., Roux, J.J.: A new approach to the lattice Boltzmann method for graphics processing units. Computers and Mathematics with Applications (in press, 2010)

    Google Scholar 

  11. Papadopoulou, M., Sadooghi-Alvandi, M., Wong, H.: Micro-benchmarking the GT200 GPU

    Google Scholar 

  12. van der Laan, W.J.: Decuda G80 dissassembler version 0.4 (2007)

    Google Scholar 

  13. Collange, S., Defour, D., Tisserand, A.: Power Consumption of GPUs from a Software Perspective. In: Proceedings of the 9th International Conference on Computational Science: Part I, p. 923. Springer, Heidelberg (2009)

    Google Scholar 

  14. Volkov, V., Demmel, J.: Benchmarking GPUs to tune dense linear algebra. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. IEEE Press, Piscataway (2008)

    Google Scholar 

  15. Peng, Y., Shu, C., Chew, Y.T.: A 3D incompressible thermal lattice Boltzmann model and its application to simulate natural convection in a cubic cavity. Journal of Computational Physics 193(1), 260–274 (2004)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Obrecht, C., Kuznik, F., Tourancheau, B., Roux, JJ. (2011). Global Memory Access Modelling for Efficient Implementation of the Lattice Boltzmann Method on Graphics Processing Units. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds) High Performance Computing for Computational Science – VECPAR 2010. VECPAR 2010. Lecture Notes in Computer Science, vol 6449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19328-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19328-6_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19327-9

  • Online ISBN: 978-3-642-19328-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics