Runtime Support for Distributed Dynamic Locality
Single node hardware design is shifting to a heterogeneous nature and many of today’s largest HPC systems are clusters that combine heterogeneous compute device architectures. The need for new programming abstractions in the advancements to the Exascale era has been widely recognized and variants of the Partitioned Global Address Space (PGAS) programming model are discussed as a promising approach in this respect. In this work, we present a graph-based approach to provide runtime support for dynamic, distributed hardware locality, specifically considering heterogeneous systems and asymmetric, deep memory hierarchies. Our reference implementation dyloc leverages hwloc to provide high-level operations on logical hardware topology based on user-specified predicates such as filter- and group transformations and locality-aware partitioning. To facilitate integration in existing applications, we discuss adapters to maintain compatibility with the established hwloc API.
This work was partially supported by the German Research Foundation (DFG) by the German Priority Programme 1648 Software for Exascale Computing (SPPEXA) and by the German Federal Ministry of Education and Research (BMBF) through the MEPHISTO project, grant agreement 01IH16006B.
- 2.Chamberlain, B.L., Deitz, S.J., Iten, D., Choi, S.-E.: User-defined distributions and layouts in Chapel: philosophy and framework. In: Proceedings of 2nd USENIX Conference on Hot Topics in Parallelism, p. 12. USENIX Association (2010)Google Scholar
- 3.Da Costa, G., Fahringer, T., Gallego, J.A.R., Grasso, I., Hristov, A., Karatza, H., Lastovetsky, A., Marozzo, F., Petcu, D., Stavrinides, G., et al.: Exascale machines require new programming paradigms and runtimes. Supercomput. Front. Innov. 2(2), 6–27 (2015)Google Scholar
- 4.Fuchs, T., Fürlinger, K.: A multi-dimensional distributed array abstraction for PGAS. In: Proceedings of 18th IEEE International Conference on High Performance Computing and Communications (HPCC 2016), Sydney, Australia, pp. 1061–1068, December 2016Google Scholar
- 5.Fuchs, T., Fürlinger, K.: Expressing and exploiting multi-dimensional locality in DASH. In: Bungartz, H.-J., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing - SPPEXA 2013-2015. LNCSE, vol. 113, pp. 341–359. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40528-5_15 CrossRefGoogle Scholar
- 6.Fürlinger, K., Fuchs, T., Kowalewski, R.: DASH: a C++ PGAS library for distributed data structures and parallel algorithms. In: Proceedings of 18th IEEE International Conference on High Performance Computing and Communications (HPCC 2016), Sydney, Australia, pp. 983–990, December 2016Google Scholar
- 7.Goglin, B.: Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc). In: International Conference on High Performance Computing & Simulation (HPCS 2014), July 2014, Bologna, Italy. IEEE (2014)Google Scholar
- 8.Goglin, B.: Exposing the locality of heterogeneous memory architectures to HPC applications. In: 1st ACM International Symposium on Memory Systems (MEMSYS 2016). ACM (2016)Google Scholar
- 9.Hajiaghayi, M., Johnson, T., Khani, M.R., Saha, B.: Hierarchical graph partitioning. In: Proceedings of 26th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 51–60. ACM (2014)Google Scholar
- 10.Kamil, A.A., Yelick, K.A.: Hierarchical additions to the SPMD programming model. Technical report UCB/EECS-2012-20, EECS Department, University of California, Berkeley, February 2012Google Scholar
- 11.Tate, A., Kamil, A., Dubey, A., Größlinger, A., Chamberlain, B., Goglin, B., Edwards, C., Newburn, C.J., Padua, D., Unat, D., et al.: Programming abstractions for data locality. Research report, PADAL Workshop 2014, 28–29 April, Swiss National Supercomputing Center (CSCS), Lugano, Switzerland, November 2014Google Scholar
- 12.Yan, Y., Zhao, J., Guo, Y., Sarkar, V.: Hierarchical place trees: a portable abstraction for task parallelism and data movement. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds.) LCPC 2009. LNCS, vol. 5898, pp. 172–187. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13374-9_12 CrossRefGoogle Scholar