Optimization of MPI-Process Mapping for Clusters with Angara Interconnect
- 2 Downloads
An algorithm of MPI processes mapping optimization is adapted for supercomputers with interconnect Angara. The mapping algorithm is based on partitioning of parallel program communication pattern. It is performed in such a way that the processes between which the most intensive exchanges take place are tied to the nodes/processors with the highest bandwidth. The algorithm finds a near-optimal distribution of its processes for processor cores to minimize the total execution time of exchanges between MPI processes. The analysis of results of optimized placement of processes using proposed method on small supercomputers is shown. The analysis of the dependence of the MPI program execution time on supercomputer parameters and task parameters is performed. A theoretical model is proposed for estimation of effect of mapping optimization on the execution time for several types of supercomputer topologies. The prospect of using implemented optimization library for large-scale supercomputers with the interconnect Angara is discussed.
Keywords and phrasesparallel programming process mapping MPI Angara interconnect
Unable to display preview. Download preview PDF.
- 1.MPI: AMessage-Passing Interface Standard, Version 3. 1. mpi-forum. org/docs/mpi-3. 1/mpi31-report. pdf.Google Scholar
- 2.A. A. Agarkov, T. F. Ismagilov, D. V. Makagon, A. S. Semyonov, and A. S. Simonov, “Performance evaluation of the Angara interconnect,” in Russian Supercomputing Days, Proceedings of the International Scientific Conference, September 26–27, 2016, Moscow(Mosk. Gos. Univ.,Moscow, 2016), pp. 626–639.Google Scholar
- 3.A. S. Simonov, D. V. Makagon, I. A. Zhabin, A. N. Sherbak, E. L. Syromiatnikov, and D. A. Polyakov, “The First Generation of Angara High-Speed Interconnect,” Naukoemk. Tekhnol. 15, 21–28 (2014).Google Scholar
- 4.V. Stegailov, A. Agarkov, S. Biryukov, T. Ismagilov, N. Kondratyuk, M. Khalilov, E. Kushtanov, D. Makagon, A. Mukosey, A. Semenov, A. Simonov, A. Timofeev, and V. Vecher, “Early evaluation of the hybrid clusterwith torus interconnect aimed at cost-effective molecular-dynamics simulations,” in Proceedings of the PPAM 2017 Conference, Lect. NotesComput. Sci. 10778, 327–336 (2018).Google Scholar
- 5.H. Subramoni et al., “Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes,” in High Performance Computing, Networking, Storage and Analysis (SC), Proceedings of the 2012 International Conference for IEEE, 2012, pp. 1–12.Google Scholar
- 6.T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proceedings of the International Conference on Supercomputing—ACM, 2011, pp. 75–84.Google Scholar
- 7.A. A. Paznikov, M. G. Kurnosov, and M. S. Kuprijanov, “Multilevel algorithms for mapping parallel MPIprograms to computing clusters,” Probl. Inform. 1, 4–17 (2015).Google Scholar
- 8.F. Broquedis et al., “hwloc: A generic framework for managing hardware affinities in HPC applications,” in Parallel, Distributed and Network-Based Processing (PDP), Proceedings of the 18th Euromicro International Conference, 2010 (IEEE, 2010), pp. 180–186.Google Scholar
- 10.Abhinav Bhatele, “Automating Topology Aware Mapping for Supercomputers,” Ph. D. Dissertation (Univ. Illinois, Urbana-Champaign, Champaign, IL, USA, 2010).Google Scholar
- 11.H. Yu et al., “Topology mapping for Blue Gene/L supercomputer,” in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (ACM, 2006), p. 116.Google Scholar
- 14.B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” The Bell Syst. Tech. J. (1970).Google Scholar