Skip to main content

Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources

  • Conference paper
  • First Online:
High Performance Computing for Computational Science – VECPAR 2018 (VECPAR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11333))

Included in the following conference series:

  • 467 Accesses

Abstract

The development of parallel solutions over contemporary heterogeneous supercomputers is complex and challenging, especially for coding, performance analysis, and behavioral characterization. The task-based programming model is a possible alternative to adequately reduce the burden on the programmer. Such model consists of dividing the application into tasks with dependencies through a directed acyclic graph (DAG), and subject the DAG to a runtime scheduler that will map tasks to resources. In this paper, we present the design, development, and performance analysis of a task-based heterogeneous (CPU and GPU) application of a Computational Fluid Dynamics (CFD) problem that simulates the flow of an incompressible Newtonian fluid with constant viscosity. We implement our solution based on the StarPU runtime and use the StarVZ toolkit to conduct a comprehensive performance analysis. Results indicate that our solution provides a 6.5\(\times \) speedup compared to the serial version on the target machine using 7 CPU workers and a 60\(\times \) speedup using 5 CPU and 2 GPU workers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Afzal, A., Ansari, Z., Faizabadi, A.R., Ramis, M.K.: Parallelization strategies for computational fluid dynamics software: state of the art review. Arch. Comput. Methods Eng. 24(2), 337–363 (2017)

    Article  MathSciNet  Google Scholar 

  2. Agullo, E., et al.: Faster, cheaper, better-a hybridization methodology to develop linear algebra software for GPUS (2010)

    Google Scholar 

  3. Agullo, E., Buttari, A., Guermouche, A., Lopez, F.: Implementing multifrontal sparse solvers for multicore architectures with sequential task flow runtime systems. ACM Trans. Math. Softw. 43(2), 13:1–13:22 (2016)

    Article  MathSciNet  Google Scholar 

  4. Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platformfor task scheduling on heterogeneous multicore architectures. Concurr. Comput.: Pract. Exp. 23, 187–198 (2011). SI: Euro-Par 2009

    Article  Google Scholar 

  5. Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)

    Article  Google Scholar 

  6. Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: a generic distributed dag engine for high performance computing. Parallel Comput. 38(1–2), 37–51 (2012)

    Article  Google Scholar 

  7. Buttari, A.: Fine granularity sparse QR factorization for multicore based systems. In: Jónasson, K. (ed.) PARA 2010. LNCS, vol. 7134, pp. 226–236. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28145-7_23

    Chapter  Google Scholar 

  8. Carpaye, J.M.C., Roman, J., Brenner, P.: Design and analysis of a task-basedparallelization over a runtime system of an explicit finite-volume CFD code withadaptive time stepping. J. Comput. Sci. 28, 439–454 (2017)

    Article  Google Scholar 

  9. Chafi, H., Sujeeth, A.K., Brown, K.J., Lee, H., Atreya, A.R., Olukotun, K.: A domain-specific approach to heterogeneous parallelism. SIGPLAN Not. 46(8), 35–46 (2011)

    Article  Google Scholar 

  10. Dagum, L., Menon, R.: OpenMP: an industry standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)

    Article  Google Scholar 

  11. Gautier, T., Lima, J.V.F., Maillard, N., Raffin, B.: XKaapi: a runtime system for data-flow task programming on heterogeneous architectures. In: IEEE International Symposium on Parallel and Distributed Processing, pp. 1299–1308 (2013)

    Google Scholar 

  12. Gropp, W., Lusk, E., Skjellum, A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface, vol. 1. MIT Press, Cambridge (1999)

    Book  Google Scholar 

  13. Jacobsen, D., Thibault, J., Senocak, I.: An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. In: 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, p. 522 (2010)

    Google Scholar 

  14. Jeannot, E., Fournier, Y., Lorendeau, B.: Experimenting task-based runtimes on a legacy computational fluid dynamics code with unstructured meshes. Comput. Fluids 173, 51–58 (2018)

    Article  MathSciNet  Google Scholar 

  15. Kjolstad, F.B., Snir, M.: Ghost cell pattern. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, p. 4. ACM (2010)

    Google Scholar 

  16. NVIDIA: CUDA Toolkit Documentation v9.2.88. NVIDIA Corporation, Santa Clara, CA, USA (2018)

    Google Scholar 

  17. Pinto, V.G., Schnorr, L.M., Stanisic, L., Legrand, A., Thibault, S., Danjean, V.: A visual performance analysis framework for task-based parallel applications running on hybrid clusters. Pract. Exp. Concurr. Comput. 30(18), e4472 (2018). https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.4472

  18. Pletcher, R.H., Tannehill, J.C., Anderson, D.: Computational Fluid Mechanics and Heat Transfer. CRC Press, Boca Raton (2012)

    MATH  Google Scholar 

  19. Robison, A.D.: Intel\(\textregistered \) threading building blocks (TBB). In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 955–964. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4_51

    Chapter  Google Scholar 

  20. Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66–73 (2010)

    Article  Google Scholar 

  21. Xie, C.: Interactive heat transfer simulations for everyone. Phys. Teach. 50(4), 237 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

This study was financed by the National Council for Scientific and Technological Development (CNPq). We thank these projects for supporting this investigation: FAPERGS GreenCloud (16/488-9), the FAPERGS MultiGPU (16/354-8), the CNPq 447311/2014-0, the CAPES/Brafitec EcoSud 182/15, and the CAPES/Cofecub 899/18. The companion material is hosted by CERN’s Zenodo for which we are also grateful.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Mello Schnorr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nesi, L.L., Schnorr, L.M., Navaux, P.O.A. (2019). Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources. In: Senger, H., et al. High Performance Computing for Computational Science – VECPAR 2018. VECPAR 2018. Lecture Notes in Computer Science(), vol 11333. Springer, Cham. https://doi.org/10.1007/978-3-030-15996-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15996-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15995-5

  • Online ISBN: 978-3-030-15996-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics