GPU Acceleration of Hermite Methods for the Simulation of Wave Propagation
The Hermite methods of Goodrich, Hagstrom, and Lorenz (2006) use Hermite interpolation to construct high order numerical methods for hyperbolic initial value problems. The structure of the method has several favorable features for parallel computing. In this work, we propose algorithms that take advantage of the many-core architecture of Graphics Processing Units. The algorithm exploits the compact stencil of Hermite methods and uses data structures that allow for efficient data load and stores. Additionally the highly localized evolution operator of Hermite methods allows us to combine multi-stage time-stepping methods within the new algorithms incurring minimal accesses of global memory. Using a scalar linear wave equation, we study the algorithm by considering Hermite interpolation and evolution as individual kernels and alternatively combined them into a monolithic kernel. For both approaches we demonstrate strategies to increase performance. Our numerical experiments show that although a two kernel approach allows for better performance on the hardware, a monolithic kernel can offer a comparable time to solution with less global memory usage.
TH was supported in part by NSF Grant DMS-1418871. TW and JC were supported in part by NSF Grant DMS-1216674. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
- 1.D. Appelö, M. Inkman, T. Hagstrom, T. Colonius, Recent progress on Hermite methods in aeroacoustics, in 17th AIAA/CEAS Aeroacoustics Conference. AIAA, 2011Google Scholar
- 3.X. Chen, Numerical and analytical studies of electromagnetic waves: hermite methods, supercontinuum generation, and multiple poles in the SEM, Doctoral Thesis, University of New Mexico, 2012Google Scholar
- 4.E.T. Dye, Performance analysis and optimization of hermite methods on NVIDIA GPUs using CUDA, Master Thesis, The University of New Mexico, 2015Google Scholar
- 6.T. Hagstrom, D. Appelö. Experiments with Hermite methods for simulating compressible flows: Runge-Kutta time-stepping and absorbing layers, in 13th AIAA/CEAS Aeroacoustics Conference. AIAA, 2007Google Scholar
- 7.T. Hagstrom, D. Appelö, 2015. Solving PDEs with hermite interpolation, in Spectral and High Order Methods for Partial Differential Equations ICOSAHOM 2014 (Springer International Publishing, Cham, 2014), pp. 31–49Google Scholar
- 10.D. Medina. OKL: a unified language for parallel architectures, Doctoral Thesis, Rice University, 2015Google Scholar
- 11.P. Micikevicius. 3D finite difference computation on GPUs using CUDA, in Proceedings of 2nd workshop on general purpose processing on graphics processing units, ACM, 2009, pp. 79–84Google Scholar
- 13.J. Sanders, E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming (Addison-Wesley Professional, Boston, MA, 2010)Google Scholar
- 15.A. Vargas, J. Chan, T. Hagstrom, T. Warburton, Variations on Hermite methods for wave propagation. arXiv:1509.08012 (2015, arXiv preprint)Google Scholar