Large-Scale Parallelization Based on CPU and GPU Cluster for Cosmological Fluid Simulations

Meng, Chen; Wang, Long; Cao, Zongyan; Feng, Long-long; Zhu, Weishan

doi:10.1007/978-3-642-53962-6_18

Chen Meng^5,6,
Long Wang⁵,
Zongyan Cao^7,5,
Long-long Feng⁸ &
…
Weishan Zhu⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 405))

Included in the following conference series:

International Conference on Parallel Computing in Fluid Dynamics

3502 Accesses
1 Citations

Abstract

In this study, we present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on an OpenMP parallelized CPU code named WIGEON. It is shown that a speedup of 13~31 (depending on the specific GPU card) can be achieved compared to the sequential Fortran code by using the GPU as the accelerator. Further more, our results show that the pure MPI parallelization scales very well up to ten thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU and GPU cards are presented (up to 256 GPU cards due to computing resource limitation). The efficiency of our scaling and high speedup relies on domain decomposition approach, optimization of the WENO algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern. We believe this hybrid MPI+CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Feng, L.-L., Shu, C.-W., Zhang, M.: A hybrid cosmological hydrodynamic/N-body code based on a weighted essentially non-oscillatory scheme. The Astrophysical Journal (September 2004)
Google Scholar
Anderson Jr., J.D.: Fundamentals of Aerodynamics, 3rd edn. (January 2001)
Google Scholar
Robert, W.F., Alan, T.M.: Introduction To Fluid Mechanics, 4th edn.
Google Scholar
Juan-Chen, H., Herng, L., Tsang-Jen, H., Tse-Yang, H.: Parallel preconditioned WENO scheme for three-dimensional flow simulation of NREL Phase VI Rotor. Computers & Fluids, 276-282 (2011)
Google Scholar
Laurent, T., Andres, E.T., Thomas, B.G., Gilmar, M.: A massively parallel hybrid scheme for direct numerical simulation of turbulent viscoelastic channel flow. Computers & Fluids, 134–142 (2011)
Google Scholar
http://www.top500.org/
Kestener, P., Château, F., Teyssier, R.: Accelerating euler equations numerical solver on graphics processing units. In: Hsu, C.-H., Yang, L.T., Park, J.H., Yeo, S.-S. (eds.) ICA3PP 2010, Part I. LNCS, vol. 6082, pp. 281–288. Springer, Heidelberg (2010)
Chapter Google Scholar
Tölke, J., Krafczyk, M.: TeraFLOP computing on a desktop PC with GPUs for 3D CFD. International Journal of Computational Fluid Dynamics, 443–456 (2008)
Google Scholar
Athanasios, S.A., Konstantinos, I.K., Eleftherios, D.P., John, A.E.: Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures. The American Institute of Aeronautics and Astronautics (2010)
Google Scholar
Appleyard, J., Drikakis, D.: Higher-order CFD and interface tracking methods on highly-Parallel MPI and GPU systems. Computers & Fluids, 101–105 (2011)
Google Scholar
Michael, G., Peter, Z.: A multi-GPU accelerated solver for the three-dimensional two-phase incompressible Navier-Stokes equations. Computer Science-Research and Development, 65–73 (2010)
Google Scholar
Paulius, M.: 3D finite difference computation on GPUs using CUDA. Architectual Support for Programming Languages and Operating Systems, 79–84 (2009)
Google Scholar
Jiang, G.S., Shu, C.W.: Efficient Implementation of Weighted ENO Schemes. J. Computational Physics, 202–208 (1996)
Google Scholar
Balsara, D.S., Shu, C.W.: Monotonicity Preserving Weighted Essentially Non-oscillatory Schemes with Increasingly High Order of Accuracy. J. Computational Physics, 405–452 (2000)
Google Scholar
Chi-Wang, S.: Total Variation Diminishing Time Discretizations. Siam Journal on Scientific and Statistical Computing (1988)
Google Scholar
Dana, A.J., Julien, C.T., Inanc, S.: An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computaions on multi-CPU clusters. The American Institute of Aeronautics and Astronautics (2010)
Google Scholar
John, L.H., David, A.P.: Computer Architecture: A Quantitative Approach, 5th edn.
Google Scholar
Paulius, M.: Analysis-Driven Optimization. In: SC 2010. ACM (2010)
Google Scholar
NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110 (v1.0, 2012)
Google Scholar
Compute Command Line Profiler User Guide. DU-05982-001_v03 (November 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Supercomputing Center of Computer Network Information Center, Chinese Academy of Sciences, No.4 South 4th Street, ZhongGuanCun, Beijing, 100190, China
Chen Meng, Long Wang & Zongyan Cao
University of Chinese Academy of Sciences, No.19 YuQuan Road, ShiJingShan, Beijing, 100049, China
Chen Meng
National Astronomical Observatories, Chinese Academy of Sciences, 20A Datun Road, Chaoyang District, Beijing, 100012, China
Zongyan Cao
Purple Mountain Observatory, Chinese Academy of Sciences, 2 West Beijing Road, Nanjing, 210008, China
Long-long Feng & Weishan Zhu

Authors

Chen Meng
View author publications
You can also search for this author in PubMed Google Scholar
Long Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zongyan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Long-long Feng
View author publications
You can also search for this author in PubMed Google Scholar
Weishan Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

College of Information Science and Engineering, Hunan University, 410082, Changsha, China
Kenli Li
College of Information Science and Engineering, Hunan University, #2, South Lushan Road, Yuelu District, 410082, Changsha, China
Zheng Xiao & Jiayi Du &
College of Information Science and Engineering, Northeastern University, 110004, Shenyang, China
Yan Wang
Hunan University, State University of New York at New Paltz,, 12561, New Paltz, NY, USA
Keqin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meng, C., Wang, L., Cao, Z., Feng, Ll., Zhu, W. (2014). Large-Scale Parallelization Based on CPU and GPU Cluster for Cosmological Fluid Simulations . In: Li, K., Xiao, Z., Wang, Y., Du, J., Li, K. (eds) Parallel Computational Fluid Dynamics. ParCFD 2013. Communications in Computer and Information Science, vol 405. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53962-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-53962-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53961-9
Online ISBN: 978-3-642-53962-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics