Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

Khalilov, M. R.; Timofeev, A. V.

doi:10.1134/S1995080218090111

Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

Part 1. Special issue “High Performance Data Intensive Computing” Editors: V. V. Voevodin, A. S. Simonov, and A. V. Lapin
Published: 08 January 2019

Volume 39, pages 1188–1198, (2018)
Cite this article

Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

M. R. Khalilov¹ &
A. V. Timofeev^1,2

60 Accesses
8 Citations
Explore all metrics

Abstract

An algorithm of MPI processes mapping optimization is adapted for supercomputers with interconnect Angara. The mapping algorithm is based on partitioning of parallel program communication pattern. It is performed in such a way that the processes between which the most intensive exchanges take place are tied to the nodes/processors with the highest bandwidth. The algorithm finds a near-optimal distribution of its processes for processor cores to minimize the total execution time of exchanges between MPI processes. The analysis of results of optimized placement of processes using proposed method on small supercomputers is shown. The analysis of the dependence of the MPI program execution time on supercomputer parameters and task parameters is performed. A theoretical model is proposed for estimation of effect of mapping optimization on the execution time for several types of supercomputer topologies. The prospect of using implemented optimization library for large-scale supercomputers with the interconnect Angara is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

MPI: AMessage-Passing Interface Standard, Version 3. 1. mpi-forum. org/docs/mpi-3. 1/mpi31-report. pdf.
A. A. Agarkov, T. F. Ismagilov, D. V. Makagon, A. S. Semyonov, and A. S. Simonov, “Performance evaluation of the Angara interconnect,” in Russian Supercomputing Days, Proceedings of the International Scientific Conference, September 26–27, 2016, Moscow(Mosk. Gos. Univ.,Moscow, 2016), pp. 626–639.
Google Scholar
A. S. Simonov, D. V. Makagon, I. A. Zhabin, A. N. Sherbak, E. L. Syromiatnikov, and D. A. Polyakov, “The First Generation of Angara High-Speed Interconnect,” Naukoemk. Tekhnol. 15, 21–28 (2014).
Google Scholar
V. Stegailov, A. Agarkov, S. Biryukov, T. Ismagilov, N. Kondratyuk, M. Khalilov, E. Kushtanov, D. Makagon, A. Mukosey, A. Semenov, A. Simonov, A. Timofeev, and V. Vecher, “Early evaluation of the hybrid clusterwith torus interconnect aimed at cost-effective molecular-dynamics simulations,” in Proceedings of the PPAM 2017 Conference, Lect. NotesComput. Sci. 10778, 327–336 (2018).
Google Scholar
H. Subramoni et al., “Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes,” in High Performance Computing, Networking, Storage and Analysis (SC), Proceedings of the 2012 International Conference for IEEE, 2012, pp. 1–12.
Google Scholar
T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proceedings of the International Conference on Supercomputing—ACM, 2011, pp. 75–84.
Google Scholar
A. A. Paznikov, M. G. Kurnosov, and M. S. Kuprijanov, “Multilevel algorithms for mapping parallel MPIprograms to computing clusters,” Probl. Inform. 1, 4–17 (2015).
Google Scholar
F. Broquedis et al., “hwloc: A generic framework for managing hardware affinities in HPC applications,” in Parallel, Distributed and Network-Based Processing (PDP), Proceedings of the 18th Euromicro International Conference, 2010 (IEEE, 2010), pp. 180–186.
Google Scholar
C. Chevalier and F. Pellegrini, “PT-Scotch: A tool for efficient parallel graph ordering,” Parallel Comput. 34, 318–331 (2008).
Article MathSciNet Google Scholar
Abhinav Bhatele, “Automating Topology Aware Mapping for Supercomputers,” Ph. D. Dissertation (Univ. Illinois, Urbana-Champaign, Champaign, IL, USA, 2010).
Google Scholar
H. Yu et al., “Topology mapping for Blue Gene/L supercomputer,” in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (ACM, 2006), p. 116.
Google Scholar
P. Balaji et al., “Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems,” Comput. Sci.-Res. Developm. 26, 247–256 (2011).
Article Google Scholar
G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM J. Sci. Comput. 20, 359–392 (1999).
Article MathSciNet MATH Google Scholar
B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” The Bell Syst. Tech. J. (1970).
Google Scholar
H. D. Simon and S.-H. Teng, “How good is recursive bisection?,” SIAM J. Sci. Comput. 18, 1436–1445 (1997).
Article MathSciNet MATH Google Scholar
T. Rauber and G. Runger, Parallel Programming for Multicore and Cluster Systems (Springer, Berlin, Heidelberg, 2013).
MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Research University Higher School of Economics, ul. Myasnitskaya 20, Moscow, 101000, Russia
M. R. Khalilov & A. V. Timofeev
Joint Institute for High Temperatures of the Russian Academy of Sciences, ul. Izhorskaya 20, str. 2, Moscow, 125412, Russia
A. V. Timofeev

Authors

M. R. Khalilov
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Timofeev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. R. Khalilov.

Additional information

(Submitted by V. V. Voevodin)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khalilov, M.R., Timofeev, A.V. Optimization of MPI-Process Mapping for Clusters with Angara Interconnect. Lobachevskii J Math 39, 1188–1198 (2018). https://doi.org/10.1134/S1995080218090111

Download citation

Received: 29 May 2018
Published: 08 January 2019
Issue Date: November 2018
DOI: https://doi.org/10.1134/S1995080218090111

Keywords and phrases

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

Abstract

Access this article

Similar content being viewed by others

Optimizing Processes Mapping for Tasks with Non-uniform Data Exchange Run on Cluster with Different Interconnects

Netloc: A Tool for Topology-Aware Process Mapping

Topology Aware Process Mapping

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords and phrases

Navigation

Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

Abstract

Access this article

Similar content being viewed by others

Optimizing Processes Mapping for Tasks with Non-uniform Data Exchange Run on Cluster with Different Interconnects

Netloc: A Tool for Topology-Aware Process Mapping

Topology Aware Process Mapping

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords and phrases

Search

Navigation