Skip to main content

Scalable Parallelization of Stencils Using MODA

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

Abstract

The natural and the design limitations of the evolution of processors, e.g., frequency scaling and memory bandwidth bottlenecks, push towards scaling applications on multiple-node configurations besides to exploiting the power of each single node. This introduced new challenges to porting applications to the new infrastructure, especially with the heterogeneous environments. Domain decomposition and handling the resulting necessary communication is not a trivial task. Parallelizing code automatically cannot be decided by tools in general as a result of the semantics of the general-purpose languages.

To allow scientists to avoid such problems, we introduce the Memory-Oblivious Data Access (MODA) technique, and use it to scale code to configurations ranging from a single node to multiple nodes, supporting different architectures, without requiring changes in the source code of the application. We present a technique to automatically identify necessary communication based on higher-level semantics. The extracted information enables tools to generate code that handles the communication. A prototype is developed to implement the techniques and used to evaluate the approach. The results show the effectiveness of using the techniques to scale code on multi-core processors and on GPU based machines. Comparing the ratios of the achieved GFLOPS to the number of nodes in each run, and repeating that on different numbers of nodes shows that the achieved scaling efficiency is around 100%. This was repeated with up to 100 nodes. An exception to this is the single-node configuration using a GPU, in which no communication is needed, and hence, no data movement between GPU and host memory is needed, which yields higher GFLOPS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The code is available at https://github.com/aimes-project/ShallowWaterEquations/.

References

  1. Bjørstad, P.E., Widlund, O.B.: Iterative methods for the solution of elliptic problems on regions partitioned into substructures. SIAM J. Numer. Anal. 23(6), 1097–1120 (1986)

    Article  MathSciNet  Google Scholar 

  2. Chan, T.F., Resasco, D.C.: A domain-decomposed fast Poisson solver on a rectangle. SIAM J. Sci. Stat. Comput. 8(1), s14–s26 (1987)

    Article  MathSciNet  Google Scholar 

  3. Christen, M., Schenk, O., Burkhart, H.: PATUS: a code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures. In: 2011 IEEE International Parallel & Distributed Processing Symposium, pp. 676–687. IEEE (2011)

    Google Scholar 

  4. Fox, G.C.: Domain decomposition in distributed and shared memory environments. In: Houstis, E.N., Papatheodorou, T.S., Polychronopoulos, C.D. (eds.) ICS 1987. LNCS, vol. 297, pp. 1042–1073. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-18991-2_62

    Chapter  Google Scholar 

  5. Fürlinger, K., et al.: DASH: data structures and algorithms with support for hierarchical locality. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 542–552. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_46

    Chapter  Google Scholar 

  6. Heybrock, S., et al.: Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 69–80. IEEE Press (2014)

    Google Scholar 

  7. Jum’ah, N., Kunkel, J.: Performance portability of earth system models with user-controlled GGDML code translation. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 693–710. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_50

    Chapter  Google Scholar 

  8. Jumah, N., Kunkel, J.: Automatic vectorization of stencil codes with the GGDML language extensions. In: Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2019, pp. 2:1–2:7. ACM, New York (2019)

    Google Scholar 

  9. Jumah, N., Kunkel, J.M., Zängl, G., Yashiro, H., Dubos, T., Meurdesoif, T.: GGDML: icosahedral models language extensions. J. Comput. Sci. Technol. Updates 4(1), 1–10 (2017)

    Article  Google Scholar 

  10. Keyes, D.E.: Domain decomposition: a bridge between nature and parallel computers. Technical report, Institute for Computer Applications in Science and Engineering Hampton VA (1992)

    Google Scholar 

  11. Keyes, D.E., Gropp, W.D.: A comparison of domain decomposition techniques for elliptic partial differential equations and their parallel implementation. SIAM J. Sci. Stat. Comput. 8(2), s166–s202 (1987)

    Article  MathSciNet  Google Scholar 

  12. Lengauer, C., et al.: ExaStencils: advanced stencil-code engineering. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8806, pp. 553–564. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-14313-2_47

    Chapter  Google Scholar 

  13. Maruyama, N., Nomura, T., Sato, K., Matsuoka, S.: Physis: an implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 11. ACM (2011)

    Google Scholar 

  14. Niu, X., Coutinho, J.G.F., Luk, W.: A scalable design approach for stencil computation on reconfigurable clusters. In: 2013 23rd International Conference on Field programmable Logic and Applications, pp. 1–4. IEEE (2013)

    Google Scholar 

  15. Yount, C., Tobin, J., Breuer, A., Duran, A.: YASK–yet another stencil kernel: a framework for HPC stencil code-generation and tuning. In: 2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC), pp. 30–39. IEEE (2016)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the German Research Foundation (DFG) through the Priority Programme 1648 Software for Exascale Computing SPPEXA (GZ: LU 1353/11-1). We also thank the Swiss National Supercomputing Center (CSCS), who provided access to their machines to run the experiments. We also thank Prof. John Thuburn – University of Exeter, for his help to develop the code of the shallow water equations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabeeh Jumah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jumah, N., Kunkel, J. (2019). Scalable Parallelization of Stencils Using MODA. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics