Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation Using OpenACC
In this paper, we develop a low-order three-dimensional finite-element solver for fast multiple-case crust deformation computation on GPU-based systems. Based on a high-performance solver designed for massively parallel CPU-based systems, we modify the algorithm to reduce random data access, and then insert OpenACC directives. By developing algorithm appropriate for each computer architecture, we enable to exhibit higher performance. The developed solver on ten Reedbush-H nodes (20 P100 GPUs) attained speedup of 14.2 times from the original solver on 20 K computer nodes. On the newest Volta generation V100 GPUs, the solver attained a further 2.52 times speedup with respect to P100 GPUs. As a demonstrative example, we computed 368 cases of crustal deformation analyses of northeast Japan with 400 million degrees of freedom. The total procedure of algorithm modification and porting implementation took only two weeks; we can see that high performance improvement was achieved with low development cost. With the developed solver, we can expect improvement in reliability of crust-deformation analyses by many-case analyses on a wide range of GPU-based systems.
We thank Mr. Craig Toepfer (NVIDIA) and Mr. Yukihiko Hirano (NVIDIA) for the generous support and performance analyses concerning the use of NVIDIA DGX-1 (Volta V100 GPU) and NVIDIA DGX-1 (Pascal P100 GPU) environment. Part of the results were obtained using the K computer at the RIKEN Advanced Institute for Computational Science (Proposal numbers: hp160221, hp160160, 160157, and hp170249). This work was supported by Post K computer project (priority issue 3: Development of Integrated Simulation Systems for Hazard and Disaster Induced by Earthquake and Tsunami), Japan Society for the Promotion of Science (KAKENHI Grant Numbers 15K18110, 26249066, 25220908, and 17K14719) and FOCUS Establishing Supercomputing Center of Excellence.
- 1.Miyazaki, H., Kusano, Y., Shinjou, N., Shoji, F., Yokokawa, M., Watanabe, T.: Overview of the K computer system. FUJITSU Sci. Tech. J. 48(3), 302–309 (2012)Google Scholar
- 2.Ichimura, T., Fujita, K., Tanaka, S., Hori, M., Maddegedara, L., Shizawa, Y., Kobayashi, H.: Physics-based urban earthquake simulation enhanced by 10.7 blndof x 30 K time-step unstructured fe non-linear seismic wave simulation. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 15–26 (2014)Google Scholar
- 3.Ichimura, T., Fujita, K., Quinay, P.E.B., Maddegedara, L., Hori, M., Tanaka, S., Shizawa, Y., Kobayashi, H., Minami, K.: Implicit nonlinear wave simulation with 1.08t dof and 0.270t unstructured finite elements to enhance comprehensive earthquake simulation. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)Google Scholar
- 4.OpenACC. http://www.openacc.org
- 5.Fujita, K., Yamaguchi, T., Ichimura, T., Hori, M., Maddegedara, L.: Acceleration of element-by-element kernel in unstructured implicit low-order finite-element earthquake simulation using openacc on pascal gpus. In: Proceedings of the Third International Workshop on Accelerator Programming Using Directives, pp. 1–12 (2016)Google Scholar
- 6.Fujita, K., Ichimura, T., Koyama, K., Inoue, H., Hori, M., Maddegedara, L.: Fast and scalable low-order implicit unstructured finite-element solver for earth’s crust deformation problem. In: Proceedings of the Platform for Advanced Scientific Computing Conference, pp. 11–20 (2017)Google Scholar
- 7.NVIDIA Pascal GPU. http://www.nvidia.com/object/tesla-p100.html
- 8.NVIDIA Volta GPU. http://www.nvidia.com/en-us/data-center/tesla-v100
- 9.Melosh, H.J., Raefsky, A.: A simple and efficient method for introducing faults into finite element computations. Bull. Seismol. Soc. Am. 71(5), 1391–1400 (1981)Google Scholar
- 14.NVIDIA DGX-1. http://www.nvidia.com/dgx1