Concurrent Parallel Processing on Graphics and Multicore Processors with OpenACC and OpenMP
Hierarchical parallel computing is rapidly becoming ubiquitous in high performance computing (HPC) systems. Programming models used commonly in turbomachinery and other engineering simulation codes have traditionally relied upon distributed memory parallelism with MPI and have ignored thread and data parallelism. This paper presents methods for programming multi-block codes for concurrent computational on host multicore CPUs and many-core accelerators such as graphics processing units. Portable and standardized methods are language directives that are used to expose data and thread parallelism within the hybrid shared and distributed-memory simulation system. A single-source/multiple-object strategy is used to simplify code management and allow for heterogeneous computing. Automated load balancing is implemented to determine what portions of the domain are computed by the multi-core CPUs and GPUs. Preliminary results indicate that a moderate overall speed-up is possible by taking advantage of all processors and accelerators on a given HPC node.
KeywordsHigh-performance computing Heterogeneous Accelerators
This material is based upon work supported by, or in part by, the Department of Defense High Performance Computing Modernization Program (HPCMP) under User Productivity, Technology Transfer and Training (PETTT) contract number GS04T09DBC0017.
US Department of Defense (DoD) Distribution Statement A: Approved for public release. Distribution is unlimited.
- 1.Martin, C.: Multicore processors: challenges, opportunities, emerging trends. In: Proceedings of Embedded World Conference 2014, Nuremberg, Germany (2014)Google Scholar
- 2.OpenACC Specification Page. http://www.openacc.org/specification. Accessed 31 July 2017
- 3.Stone, C., Davis, R.: High-performance 3D multi-disciplinary fluid/thermal prediction using combined multi-core/multi-GPGPU computer systems. In: 22nd AIAA Computational Fluid Dynamics Conference, Dallas, Texas, USA (2015). https://doi.org/10.2514/6.2015-3058
- 4.OpenMP Specification Page. http://www.openmp.org/specifications. Accessed 31 July 2017
- 6.Kraus, J., Schlottke, M., Adinetz, A., Pleiter, D.: Accelerating a C++ CFD code with OpenACC. In: 1st Workshop on Accelerator Programming Using Directives, pp. 47–54. IEEE (2014). https://doi.org/10.1109/WACCPD.2014.11
- 7.Wilcox, D.C.: Turbulence Modeling for CFD. DCW Industries, La Cannada (1998)Google Scholar
- 9.Strelets, M.: Detached eddy simulation of massively separated flows. In: 39th Aerospace Sciences Meeting and Exhibit, Reno, Nevada (2001). https://doi.org/10.2514/6.2001-879
- 10.Bush, R.H., Mani, M.: A two-equation large eddy stress model for high sub-grid shear. In: 15th AIAA Computational Fluid Dynamics Conference, Anaheim, CA (2001). https://doi.org/10.2514/6.2001-2561
- 11.Bozinoski, R., Davis, R.L.: General three-dimensional, multi-block, parallel turbulent Navier-Stokes procedure. In: AIAA Aerospace Sciences Meeting. Reno, Nevada (2008). https://doi.org/10.2514/6.2008-756
- 13.Dannenhoffer, J.F.: Grid Adaptation for Complex Two-Dimensional Transonic Flows. Technical report CFDL-TR-87-10, Institute of Technology, Massachusetts (1987)Google Scholar
- 15.Jameson, A.: Time dependent calculations using multi-grid, with applications to unsteady flows past airfoils and wings. In: 10th AIAA Computational Fluid Dynamics Conference, Honolulu, HI (1991). https://doi.org/10.2514/6.1991-1596