Skip to main content

Expressing and Exploiting Multi-Dimensional Locality in DASH

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 113))

Abstract

DASH is a realization of the PGAS (partitioned global address space) programming model in the form of a C++ template library. It provides a multi-dimensional array abstraction which is typically used as an underlying container for stencil- and dense matrix operations. Efficiency of operations on a distributed multi-dimensional array highly depends on the distribution of its elements to processes and the communication strategy used to propagate values between them. Locality can only be improved by employing an optimal distribution that is specific to the implementation of the algorithm, run-time parameters such as node topology, and numerous additional aspects. Application developers do not know these implications which also might change in future releases of DASH. In the following, we identify fundamental properties of distribution patterns that are prevalent in existing HPC applications. We describe a classification scheme of multi-dimensional distributions based on these properties and demonstrate how distribution patterns can be optimized for locality and communication avoidance automatically and, to a great extent, at compile-time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.boost.org/community/generic_programming.html

  2. 2.

    https://www.lrz.de/services/compute/supermuc/systemdescription/

  3. 3.

    http://www.nersc.gov/users/computational-systems/cori/cori-phase-i/

References

  1. Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Ltaief, H., Luszczek, P., Tomov, S.: Numerical linear algebra on emerging architectures: The plasma and magma projects. J. Phys.: Conf. Ser. 180, 012037 (2009). IOP Publishing

    Google Scholar 

  2. Alexandrescu, A.: Modern C++ Design: Generic Programming and Design Patterns Applied. Addison-Wesley, Boston (2001)

    Google Scholar 

  3. Ang, J.A., Barrett, R.F., Benner, R.E., Burke, D., Chan, C., Cook, J., Donofrio, D., Hammond, S.D., Hemmert, K.S., Kelly, S.M., Le, H., Leung, V.J., Resnick, D.R., Rodrigues, A.F., Shalf, J., Stark, D., Unat, D., Wright, N.J.: Abstract machine models and proxy architectures for exascale computing. In: Proceedings of the 1st International Workshop on Hardware-Software Co-design for High Performance Computing (Co-HPC ’14), pp. 25–32. IEEE Press, Piscataway (2014)

    Google Scholar 

  4. de Blas Cartón, C., Gonzalez-Escribano, A., Llanos, D.R.: Effortless and efficient distributed data-partitioning in linear algebra. In: 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC), pp. 89–97. IEEE (2010)

    Google Scholar 

  5. Chamberlain, B.L., Choi, S.E., Deitz, S.J., Iten, D., Litvinov, V.: Authoring user-defined domain maps in Chapel. In: CUG 2011 (2011)

    Google Scholar 

  6. Chamberlain, B.L., Choi, S.E., Lewis, E.C., Lin, C., Snyder, L., Weathersby, W.D.: ZPL: A machine independent programming language for parallel computers. IEEE Trans. Softw. Eng. 26 (3), 197–211 (2000)

    Article  Google Scholar 

  7. Chavarría-Miranda, D.G., Darte, A., Fowler, R., Mellor-Crummey, J.M.: Generalized multipartitioning for multi-dimensional arrays. In: Proceedings of the 16th International Parallel and Distributed Processing Symposium. p. 164. IEEE Computer Society (2002)

    Google Scholar 

  8. Choi, J., Demmel, J., Dhillon, I., Dongarra, J., Ostrouchov, S., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK: A portable linear algebra library for distributed memory computers – Design issues and performance. In: Applied Parallel Computing Computations in Physics, Chemistry and Engineering Science, pp. 95–106. Springer (1995)

    Google Scholar 

  9. Edwards, H.C., Sunderland, D., Porter, V., Amsler, C., Mish, S.: Manycore performance-portability: Kokkos multidimensional array library. Sci. Program. 20 (2), 89–114 (2012)

    Google Scholar 

  10. Fürlinger, K., Glass, C., Knüpfer, A., Tao, J., Hünich, D., Idrees, K., Maiterth, M., Mhedeb, Y., Zhou, H.: DASH: Data structures and algorithms with support for hierarchical locality. In: Euro-Par 2014 Workshops (Porto, Portugal). Lecture Notes in Computer Science, pp. 542–552. Springer (2014)

    Google Scholar 

  11. Hornung, R., Keasler, J.: The RAJA portability layer: overview and status. Tech. rep., Lawrence Livermore National Laboratory (LLNL), Livermore (2014)

    Google Scholar 

  12. Kamil, A., Zheng, Y., Yelick, K.: A local-view array library for partitioned global address space C++ programs. In: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, p. 26. ACM (2014)

    Google Scholar 

  13. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36 (5), C401–C423 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  14. Krishnan, M., Nieplocha, J.: SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems. In: Proceedings of 18th International Parallel and Distributed Processing Symposium 2004, p. 70. IEEE (2004)

    Google Scholar 

  15. Naik, N.H., Naik, V.K., Nicoules, M.: Parallelization of a class of implicit finite difference schemes in computational fluid dynamics. Int. J. High Speed Comput. 5 (01), 1–50 (1993)

    Article  Google Scholar 

  16. Tate, A., Kamil, A., Dubey, A., Größlinger, A., Chamberlain, B., Goglin, B., Edwards, C., Newburn, C.J., Padua, D., Unat, D., et al.: Programming abstractions for data locality. Research report, PADAL Workshop 2014, April 28–29, Swiss National Supercomputing Center (CSCS), Lugano (Nov 2014)

    Google Scholar 

  17. Unat, D., Chan, C., Zhang, W., Bell, J., Shalf, J.: Tiling as a durable abstraction for parallelism and data locality. In: Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (2013)

    Google Scholar 

  18. Van De Geijn, R.A., Watts, J.: SUMMA: Scalable universal matrix multiplication algorithm. Concurr. Comput. 9 (4), 255–274 (1997)

    Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge funding by the German Research Foundation (DFG) through the German Priority Program 1648 Software for Exascale Computing (SPPEXA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tobias Fuchs .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fuchs, T., Fürlinger, K. (2016). Expressing and Exploiting Multi-Dimensional Locality in DASH. In: Bungartz, HJ., Neumann, P., Nagel, W. (eds) Software for Exascale Computing - SPPEXA 2013-2015. Lecture Notes in Computational Science and Engineering, vol 113. Springer, Cham. https://doi.org/10.1007/978-3-319-40528-5_15

Download citation

Publish with us

Policies and ethics