Abstract
In this paper, we present a MPI-CUDA implementation for our in-house CFD software HOSTA to accelerate large-scale high-order CFD simulations on the TianHe-1A supercomputer. HOSTA employs a fifth order weighted compact nonlinear scheme (WCNS-E5) for flux calculation and a Runge-Kutta method for time integration. In our GPU parallelization scheme, we use CUDA thrad blocks to efficiently exploit fine-grained parallelism within a 3D grid block, and CUDA multiple streams to exploit coarse-grained parallelism among multiple grid blocks. At the CUDA-device level, we decompose complex flux kernels to optimize the GPU performance . At the cluster level, we present a Scatter-Gather optimization to reduce the PEI-E data transfer times for 3D block boundary/singularity data, and we overlap MPI communication and GPU execution. We achieve a speedup of about 10 when comparing our GPU code on a Tesla M2050 with the serial code on a Xeon X5670, and our implementation scales well to 128 GPUs on TianHe-1A.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deng, X.G., Maekawa, H., Shen, Q.: A class of high-order dissipative compact schemes. AIAA Paper 96-1972 (1996)
Deng, X.G., Zhang, H.X.: Developing high-order weighted compact nonlinear schemes. J. Comput. Phys. 165, 22–44 (2000)
Deng, X.G., Mao, M.L., Tu, G.H., Liu, H.Y., Zhang, H.X.: Geometric conservation law and applications to high-order finite difference schemes with stationary grids. J. Comput. Phys. 230, 1100–1115 (2011)
Deng, X.G., Mao, M.L., Tu, G.H., et al.: Extending the fifth-order weighted compact nonlinear scheme to complex grids with characteristic-based interface conditions. AIAA Journal 48(12), 2840–2851 (2010)
Deng, X.G., Mao, M.L., Tu, G.H., et al.: High-order and high accurate CFD methods and their applications for complex grid problems. Commun. Comput. Phys. 11, 1081–1102 (2012)
Jacobsen, D.A., Thibault, J.C., Senocak, I.: An MPI-CUDA implementation for massively parallel incompressible flow computations on multi-GPU clusters. AIAA Paper 2010-522 (2010)
DeLeon, R., Jacobsen, D., Senocak, I.: Large-eddy simulations of turbulent incompressible flows on GPU Clusters. Computing in Science & Engine 15, 26–33 (2013)
Antoniou, A.S., Karantasis, K.I., Polychronopoulos, E.D.: Acceleration of a finite-difference WENO scheme for large-scale simulations on many-core architectures. AIAA paper 2010-0525 (2010)
Castonguay, P., Williams, D.M., Vincent, P.E., Lopez, M., Jameson, A.: On the development of a high-order, multi-GPU enabled, compressible viscous flow solver for mixed unstructured grids. AIAA paper 2011-3229 (2011)
Appleyard, J., Drikakis, D.: Higher-order CFD and interface tracking methods on highly-parallel MPI and GPU systems. Computers & Fluids 46, 101–105 (2011)
Zaspel, P., Griebel, M.: Solving incompressible two-phase flows on multi-GPU clusters. Comput & Fluids (2012)
Yang, X.J., Liao, X.K., Lu, K., et al.: The TianHe-1A supercomputer: its hardware and software. Journal of Computer Science and Technology 26, 344–351 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, C. et al. (2014). Accelerating High-Order CFD Simulations for Multi-block Structured Grids on the TianHe-1A Supercomputer. In: Li, K., Xiao, Z., Wang, Y., Du, J., Li, K. (eds) Parallel Computational Fluid Dynamics. ParCFD 2013. Communications in Computer and Information Science, vol 405. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53962-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-53962-6_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53961-9
Online ISBN: 978-3-642-53962-6
eBook Packages: Computer ScienceComputer Science (R0)