Scalable Parallelization of Stencils Using MODA

Jumah, Nabeeh; Kunkel, Julian

doi:10.1007/978-3-030-34356-9_13

Nabeeh Jumah¹² &
Julian Kunkel¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

International Conference on High Performance Computing

5953 Accesses
1 Citations

Abstract

The natural and the design limitations of the evolution of processors, e.g., frequency scaling and memory bandwidth bottlenecks, push towards scaling applications on multiple-node configurations besides to exploiting the power of each single node. This introduced new challenges to porting applications to the new infrastructure, especially with the heterogeneous environments. Domain decomposition and handling the resulting necessary communication is not a trivial task. Parallelizing code automatically cannot be decided by tools in general as a result of the semantics of the general-purpose languages.

To allow scientists to avoid such problems, we introduce the Memory-Oblivious Data Access (MODA) technique, and use it to scale code to configurations ranging from a single node to multiple nodes, supporting different architectures, without requiring changes in the source code of the application. We present a technique to automatically identify necessary communication based on higher-level semantics. The extracted information enables tools to generate code that handles the communication. A prototype is developed to implement the techniques and used to evaluate the approach. The results show the effectiveness of using the techniques to scale code on multi-core processors and on GPU based machines. Comparing the ratios of the achieved GFLOPS to the number of nodes in each run, and repeating that on different numbers of nodes shows that the achieved scaling efficiency is around 100%. This was repeated with up to 100 nodes. An exception to this is the single-node configuration using a GPU, in which no communication is needed, and hence, no data movement between GPU and host memory is needed, which yields higher GFLOPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The code is available at https://github.com/aimes-project/ShallowWaterEquations/.

References

Bjørstad, P.E., Widlund, O.B.: Iterative methods for the solution of elliptic problems on regions partitioned into substructures. SIAM J. Numer. Anal. 23(6), 1097–1120 (1986)
Article MathSciNet Google Scholar
Chan, T.F., Resasco, D.C.: A domain-decomposed fast Poisson solver on a rectangle. SIAM J. Sci. Stat. Comput. 8(1), s14–s26 (1987)
Article MathSciNet Google Scholar
Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 676–687. IEEE (2011)
Google Scholar
Fox, G.C.: Domain decomposition in distributed and shared memory environments. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds.) ICS 1987. LNCS, vol. 297, pp. 1042–1073. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-18991-2_62
Chapter Google Scholar
Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46
Chapter Google Scholar
Heybrock, S., et al.: Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 69–80. IEEE Press (2014)
Google Scholar
Jum’ah, N., Kunkel, J.: Performance portability of earth system models with user-controlled GGDML code translation. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 693–710. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_50
Chapter Google Scholar
Jumah, N., Kunkel, J.: Automatic vectorization of stencil codes with the GGDML language extensions. In: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2019, pp. 2:1–2:7. ACM, New York (2019)
Google Scholar
Jumah, N., Kunkel, J.M., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, T.: GGDML: icosahedral models language extensions. J. Comput. Sci. Technol. Updates 4(1), 1–10 (2017)
Article Google Scholar
Keyes, D.E.: Domain decomposition: a bridge between nature and parallel computers. Technical report, Institute for Computer Applications in Science and Engineering Hampton VA (1992)
Google Scholar
Keyes, D.E., Gropp, W.D.: A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation. SIAM J. Sci. Stat. Comput. 8(2), s166–s202 (1987)
Article MathSciNet Google Scholar
Lengauer, C., et al.: ExaStencils: advanced stencil-code engineering. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 553–564. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_47
Chapter Google Scholar
Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 11. ACM (2011)
Google Scholar
Niu, X., Coutinho, J.G.F., Luk, W.: A scalable design approach for stencil computation on reconfigurable clusters. In: 2013 23rd International Conference on Field programmable Logic and Applications, pp. 1–4. IEEE (2013)
Google Scholar
Yount, C., Tobin, J., Breuer, A., Duran, A.: YASK–yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 30–39. IEEE (2016)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 Software for Exascale Computing SPPEXA (GZ: LU 1353/11-1). We also thank the Swiss National Supercomputing Center (CSCS), who provided access to their machines to run the experiments. We also thank Prof. John Thuburn – University of Exeter, for his help to develop the code of the shallow water equations.

Author information

Authors and Affiliations

Universität Hamburg, Hamburg, Germany
Nabeeh Jumah
University of Reading, Reading, UK
Julian Kunkel

Authors

Nabeeh Jumah
View author publications
You can also search for this author in PubMed Google Scholar
Julian Kunkel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabeeh Jumah .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf, Dresden, Sachsen, Germany
Guido Juckeland
Swiss National Supercomputing Centre, Lugano, Ticino, Switzerland
Sadaf Alam
University of Tennessee at Knoxville, Knoxville, TN, USA
Heike Jagode

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jumah, N., Kunkel, J. (2019). Scalable Parallelization of Stencils Using MODA. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-34356-9_13
Published: 03 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34355-2
Online ISBN: 978-3-030-34356-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics