Abstract
Solving an initial value problem of a large system of ordinary differential equations (ODEs) on a GPU is often memory bound, which makes optimizing the locality of memory references important. We exploit the limited access distance, which is a property of a large class of right-hand-side functions, to enable hexagonal or trapezoidal tiling across the stages of the ODE method. Since previous work showed that the traditional approach of launching one workgroup per tile is worthwhile only for small limited access distances, we introduce an approach where several workgroups cooperate on a tile (multi-workgroup tiling) and investigate several optimizations and variations. Finally, we show the superiority of the multi-workgroup tiling over the traditional single-workgroup tiling for large access distances by a detailed experimental evaluation using two different Runge–Kutta (RK) methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Filipovič, J., Madzin, M., Fousek, J., Matyska, L.: Optimizing CUDA code by kernel fusion: application on BLAS. J. Supercomput. 71(10), 3934–3957 (2015). https://doi.org/10.1007/s11227-015-1483-z
Grosser, T., Cohen, A., Holewinski, J., Sadayappan, P., Verdoolaege, S.: Hybrid hexagonal/classical tiling for GPUs. In: Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), pp. 66–75. ACM (2014). https://doi.org/10.1145/2544137.2544160
Hairer, E., Nørsett, S.P., Wanner, G.: Solving Ordinary Differential Equations I: Nonstiff Problems, 2nd edn. Springer, Berlin (2000). https://doi.org/10.1007/978-3-540-78862-1
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach, 5th edn. Morgan Kaufmann, Amsterdam (2011)
Korch, M.: Locality improvement of data-parallel Adams–Bashforth methods through block-based pipelining of time steps. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 563–574. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_56
Korch, M., Werner, T.: Accelerating explicit ODE methods by kernel fusion. Concurr. Comput. Pract. Exp. 30(18), e4470 (2018). https://doi.org/10.1002/cpe.4470
Korch, M., Werner, T.: Exploiting limited access distance for kernel fusion across the stages of explicit one-step methods on GPUs. In: 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 148–157 (2018). https://doi.org/10.1109/CAHPC.2018.8645892
Malas, T., Hager, G., Ltaief, H., Stengel, H., Wellein, G., Keyes, D.: Multicore-optimized wavefront diamond blocking for optimizing stencil updates. SIAM J. Sci. Comput. 37(4), C439–C464 (2015). https://doi.org/10.1137/140991133
Wahib, M., Maruyama, N.: Automated GPU kernel transformations in large-scale production stencil applications. In: 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC), pp. 259–270 (2015). https://doi.org/10.1145/2749246.2749255
Xiao, S., Aji, A.M., Feng, W.: On the robust mapping of dynamic programming onto a graphics processing unit. In: 15th International Conference on Parallel and Distributed Systems (ICPADS), pp. 26–33 (December 2009). https://doi.org/10.1109/ICPADS.2009.110
Xiao, S., Feng, W.: Inter-block GPU communication via fast barrier synchronization. In: IEEE International Symposium on Parallel Distributed Processing (IPDPS), pp. 1–12 (April 2010). https://doi.org/10.1109/IPDPS.2010.5470477
Acknowledgment
This work has been supported by the German Research Foundation (DFG) under grant KO 2252/3-2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Korch, M., Werner, T. (2020). Multi-workgroup Tiling to Improve the Locality of Explicit One-Step Methods for ODE Systems with Limited Access Distance on GPUs. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12043. Springer, Cham. https://doi.org/10.1007/978-3-030-43229-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-43229-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43228-7
Online ISBN: 978-3-030-43229-4
eBook Packages: Computer ScienceComputer Science (R0)