Abstract
Accelerating applications with portability and maintainability is one of the big challenges in science and engineering. Previously, we have developed a fast implicit low-order three-dimensional finite element solver, which has a complicated algorithm including artificial intelligence and transprecision computing. In addition, all possible tunings for the target architecture were implemented; accordingly, the solver has inferior portability and maintainability. In this paper, we apply OpenACC to the solver. The directive-based implementation of OpenACC enables GPU computation to be introduced with a smaller developmental cost even for complex codes. In performance measurements on AI Bridging Cloud Infrastructure (ABCI), we evaluated that a reasonable speedup was attained on GPUs, given that the elapsed time of the entire solver was reduced to 1/14 of that on CPUs based on the original CPU implementation. Our proposed template to use transprecision computing with our custom FP21 data type is available to the public; therefore, it can provide a successful example for other scientific computing applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Data Availability Statement
Summary of the Experiments Reported
We ran our built-from-scratch implicit solver for unstructured finite elements on AI bridging cloud infrastructure with PGI compiler and OpenMPI.
Artifact Availability
Software Artifact Availability: Some author-created software artifacts are NOT maintained in a public repository or are NOT available under an OSI-approved license.
Hardware Artifact Availability: There are no author-created hardware artifacts.
Data Artifact Availability: Some author-created data artifacts are NOT maintained in a public repository or are NOT available under an OSI-approved license.
Proprietary Artifacts: There are associated proprietary artifacts that are not created by the authors. Some author-created artifacts are proprietary.
List of URLs and/or DOIs where artifacts are available:
http://doi.org/10.6084/m9.figshare.11603382
http://www.data.jma.go.jp/svd/eqev/data/kyoshin/jishin/hyogo_nanbu/dat/H1171931.csv.
Details regarding baseline experimental setup, and modifications made for the paper are available at [26].
References
Bielak, J., Ghattas, O., Kim, E.: Parallel octree-based finite element method for large-scale earthquake ground motion simulation. Comput. Model. Eng. Sci. 10(2), 99 (2005)
Computing Resources of AI bridging Clound Infrastructure. https://abci.ai/en/about_abci/computing_resource.html. Accessed 11 Oct 2019
Farber, R.: Parallel programming with OpenACC. Newnes, Oxford (2016)
Fujita, K., et al.: Development of element-by-element kernel algorithms in unstructured implicit low-order finite-element earthquake simulation for many-core Wide-SIMD CPUs. In: Rodrigues, J., et al. (eds.) ICCS 2019. LNCS, vol. 11536, pp. 267–280. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22734-0_20
Fujita, K., Yamaguchi, T., Ichimura, T., Hori, M., Maddegedara, L.: Acceleration of element-by-element kernel in unstructured implicit low-order finite-element earthquake simulation using OpenACC on pascal GPUs. In: Proceedings of the Third International Workshop on Accelerator Programming Using Directives, pp. 1–12. IEEE Press (2016)
Golub, G.H., Ye, Q.: Inexact preconditioned conjugate gradient method with inner-outer iteration. SIAM J. Sci. Comput. 21(4), 1305–1320 (1999)
Ichimura, T., et al.: A fast scalable implicit solver with concentrated computation for nonlinear time-evolution problems on low-order unstructured finite elements. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 620–629. IEEE (2018)
Ichimura, T., et al.: A fast scalable implicit solver for nonlinear time-evolution earthquake city problem on low-ordered unstructured finite elements with artificial intelligence and transprecision computing. In: SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 627–637. IEEE (2018)
Idriss, I.M., Dobry, R., Sing, R.: Nonlinear behavior of soft clays during cyclic loading. J. Geotech. Geoenviron. Eng. 104(ASCE 14265), 1427–1447 (1978)
Japan Meteorological Agency: Strong ground motion of The Southern Hyogo prefecture earthquake in 1995 observed at Kobe JMA observatory. https://www.data.jma.go.jp/svd/eqev/data/kyoshin/jishin/hyogo_nanbu/dat/H1171931.csv. Accessed 11 Oct 2018
Jeong, S., Solenthaler, B., Pollefeys, M., Gross, M., et al.: Data-driven fluid simulations using regression forests. ACM Trans. Graph. (TOG) 34(6), 199 (2015)
Jia, Z., Maggioni, M., Staiger, B., Scarpazza, D.P.: Dissecting the NVIDIA volta GPU architecture via microbenchmarking. arXiv preprint arXiv:1804.06826 (2018)
Kahan, W.: IEEE standard 754 for binary floating-point arithmetic. Lecture Notes Status IEEE 754(94720–1776), 11 (1996)
Kurth, T., et al.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 51. IEEE Press (2018)
Malossi, A.C.I., et al.: The transprecision computing paradigm: concept, design, and applications. In: 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1105–1110. IEEE (2018)
Massing, G.: Eigenspannungen und verfestigung beinn messing. In: Proceedings of the 2nd International Congress of Applied Mechanics (1926)
Micikevicius, P.: 3D finite difference computation on GPUs using CUDA. In: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp. 79–84. ACM (2009)
Miyazaki, H., Kusano, Y., Shinjou, N., Shoji, F., Yokokawa, M., Watanabe, T.: Overview of the K computer system. Fujitsu Sci. Tech. J. 48(3), 302–309 (2012)
NVIDIA Tesla V100 GPU architecture. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. Accessed 11 Oct 2019
OpenACC. http://www.openacc.org/. Accessed 11 Oct 2019
Saad, Y.: Iterative Methods for Sparse Linear Systems, vol. 82. SIAM, Philadelphia (2003)
Summit. https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/. Accessed 11 Oct 2019
Using bfloat16 with tensorflow models. https://cloud.google.com/tpu/docs/bfloat16. Accessed 11 Oct 2019
Winget, J.M., Hughes, T.J.: Solution algorithms for nonlinear transient heat conduction analysis employing element-by-element iterative strategies. Comput. Methods Appl. Mech. Eng. 52(1–3), 711–815 (1985)
Yamaguchi, T., Fujita, K., Ichimura, T., Hori, M., Lalith, M., Nakajima, K.: Implicit low-order unstructured finite-element multiple simulation enhanced by dense computation using OpenACC. In: Chandrasekaran, S., Juckeland, G. (eds.) WACCPD 2017. LNCS, vol. 10732, pp. 42–59. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-74896-2_3
Yamaguchi, T., Fujita, K., Ichimura, T., Naruse, A., Lalith, M., Hori, M.: FP21AXPY. Figshare (2020). https://doi.org/10.6084/m9.figshare.11603382. Accessed 22 Jan 2020
Acknowledgement
Our results were obtained using Computational resource of AI Bridging Cloud Infrastructure (ABCI), National Institute of Advanced Industrial Science and Technology (AIST). We acknowledge support from Post K computer project (Priority Issue 3 - Development of integrated simulation systems for hazards and disasters induced by earthquakes and tsunamis), and Japan Society for the Promotion of Science (17K14719, 18H05239, 18K18873). Part of our results were obtained using the Summit at Oak Ridge Leadership Computing Facility, a US Department of Energy, Office of Science User Facility at Oak Ridge National Laboratory.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yamaguchi, T., Fujita, K., Ichimura, T., Naruse, A., Lalith, M., Hori, M. (2020). GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC. In: Wienke, S., Bhalachandra, S. (eds) Accelerator Programming Using Directives. WACCPD 2019. Lecture Notes in Computer Science(), vol 12017. Springer, Cham. https://doi.org/10.1007/978-3-030-49943-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-49943-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49942-6
Online ISBN: 978-3-030-49943-3
eBook Packages: Computer ScienceComputer Science (R0)