Abstract
The natural and the design limitations of the evolution of processors, e.g., frequency scaling and memory bandwidth bottlenecks, push towards scaling applications on multiple-node configurations besides to exploiting the power of each single node. This introduced new challenges to porting applications to the new infrastructure, especially with the heterogeneous environments. Domain decomposition and handling the resulting necessary communication is not a trivial task. Parallelizing code automatically cannot be decided by tools in general as a result of the semantics of the general-purpose languages.
To allow scientists to avoid such problems, we introduce the Memory-Oblivious Data Access (MODA) technique, and use it to scale code to configurations ranging from a single node to multiple nodes, supporting different architectures, without requiring changes in the source code of the application. We present a technique to automatically identify necessary communication based on higher-level semantics. The extracted information enables tools to generate code that handles the communication. A prototype is developed to implement the techniques and used to evaluate the approach. The results show the effectiveness of using the techniques to scale code on multi-core processors and on GPU based machines. Comparing the ratios of the achieved GFLOPS to the number of nodes in each run, and repeating that on different numbers of nodes shows that the achieved scaling efficiency is around 100%. This was repeated with up to 100 nodes. An exception to this is the single-node configuration using a GPU, in which no communication is needed, and hence, no data movement between GPU and host memory is needed, which yields higher GFLOPS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The code is available at https://github.com/aimes-project/ShallowWaterEquations/.
References
Bjørstad, P.E., Widlund, O.B.: Iterative methods for the solution of elliptic problems on regions partitioned into substructures. SIAM J. Numer. Anal. 23(6), 1097–1120 (1986)
Chan, T.F., Resasco, D.C.: A domain-decomposed fast Poisson solver on a rectangle. SIAM J. Sci. Stat. Comput. 8(1), s14–s26 (1987)
Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 676–687. IEEE (2011)
Fox, G.C.: Domain decomposition in distributed and shared memory environments. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds.) ICS 1987. LNCS, vol. 297, pp. 1042–1073. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-18991-2_62
Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46
Heybrock, S., et al.: Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 69–80. IEEE Press (2014)
Jum’ah, N., Kunkel, J.: Performance portability of earth system models with user-controlled GGDML code translation. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 693–710. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_50
Jumah, N., Kunkel, J.: Automatic vectorization of stencil codes with the GGDML language extensions. In: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2019, pp. 2:1–2:7. ACM, New York (2019)
Jumah, N., Kunkel, J.M., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, T.: GGDML: icosahedral models language extensions. J. Comput. Sci. Technol. Updates 4(1), 1–10 (2017)
Keyes, D.E.: Domain decomposition: a bridge between nature and parallel computers. Technical report, Institute for Computer Applications in Science and Engineering Hampton VA (1992)
Keyes, D.E., Gropp, W.D.: A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation. SIAM J. Sci. Stat. Comput. 8(2), s166–s202 (1987)
Lengauer, C., et al.: ExaStencils: advanced stencil-code engineering. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 553–564. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_47
Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 11. ACM (2011)
Niu, X., Coutinho, J.G.F., Luk, W.: A scalable design approach for stencil computation on reconfigurable clusters. In: 2013 23rd International Conference on Field programmable Logic and Applications, pp. 1–4. IEEE (2013)
Yount, C., Tobin, J., Breuer, A., Duran, A.: YASK–yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 30–39. IEEE (2016)
Acknowledgements
This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 Software for Exascale Computing SPPEXA (GZ: LU 1353/11-1). We also thank the Swiss National Supercomputing Center (CSCS), who provided access to their machines to run the experiments. We also thank Prof. John Thuburn – University of Exeter, for his help to develop the code of the shallow water equations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jumah, N., Kunkel, J. (2019). Scalable Parallelization of Stencils Using MODA. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-34356-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)