Skip to main content
Log in

Optimization of MPI-Process Mapping for Clusters with Angara Interconnect

  • Part 1. Special issue “High Performance Data Intensive Computing” Editors: V. V. Voevodin, A. S. Simonov, and A. V. Lapin
  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

An algorithm of MPI processes mapping optimization is adapted for supercomputers with interconnect Angara. The mapping algorithm is based on partitioning of parallel program communication pattern. It is performed in such a way that the processes between which the most intensive exchanges take place are tied to the nodes/processors with the highest bandwidth. The algorithm finds a near-optimal distribution of its processes for processor cores to minimize the total execution time of exchanges between MPI processes. The analysis of results of optimized placement of processes using proposed method on small supercomputers is shown. The analysis of the dependence of the MPI program execution time on supercomputer parameters and task parameters is performed. A theoretical model is proposed for estimation of effect of mapping optimization on the execution time for several types of supercomputer topologies. The prospect of using implemented optimization library for large-scale supercomputers with the interconnect Angara is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. MPI: AMessage-Passing Interface Standard, Version 3. 1. mpi-forum. org/docs/mpi-3. 1/mpi31-report. pdf.

  2. A. A. Agarkov, T. F. Ismagilov, D. V. Makagon, A. S. Semyonov, and A. S. Simonov, “Performance evaluation of the Angara interconnect,” in Russian Supercomputing Days, Proceedings of the International Scientific Conference, September 26–27, 2016, Moscow(Mosk. Gos. Univ.,Moscow, 2016), pp. 626–639.

    Google Scholar 

  3. A. S. Simonov, D. V. Makagon, I. A. Zhabin, A. N. Sherbak, E. L. Syromiatnikov, and D. A. Polyakov, “The First Generation of Angara High-Speed Interconnect,” Naukoemk. Tekhnol. 15, 21–28 (2014).

    Google Scholar 

  4. V. Stegailov, A. Agarkov, S. Biryukov, T. Ismagilov, N. Kondratyuk, M. Khalilov, E. Kushtanov, D. Makagon, A. Mukosey, A. Semenov, A. Simonov, A. Timofeev, and V. Vecher, “Early evaluation of the hybrid clusterwith torus interconnect aimed at cost-effective molecular-dynamics simulations,” in Proceedings of the PPAM 2017 Conference, Lect. NotesComput. Sci. 10778, 327–336 (2018).

    Google Scholar 

  5. H. Subramoni et al., “Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes,” in High Performance Computing, Networking, Storage and Analysis (SC), Proceedings of the 2012 International Conference for IEEE, 2012, pp. 1–12.

    Google Scholar 

  6. T. Hoefler and M. Snir, “Generic topology mapping strategies for large-scale parallel architectures,” in Proceedings of the International Conference on Supercomputing—ACM, 2011, pp. 75–84.

    Google Scholar 

  7. A. A. Paznikov, M. G. Kurnosov, and M. S. Kuprijanov, “Multilevel algorithms for mapping parallel MPIprograms to computing clusters,” Probl. Inform. 1, 4–17 (2015).

    Google Scholar 

  8. F. Broquedis et al., “hwloc: A generic framework for managing hardware affinities in HPC applications,” in Parallel, Distributed and Network-Based Processing (PDP), Proceedings of the 18th Euromicro International Conference, 2010 (IEEE, 2010), pp. 180–186.

    Google Scholar 

  9. C. Chevalier and F. Pellegrini, “PT-Scotch: A tool for efficient parallel graph ordering,” Parallel Comput. 34, 318–331 (2008).

    Article  MathSciNet  Google Scholar 

  10. Abhinav Bhatele, “Automating Topology Aware Mapping for Supercomputers,” Ph. D. Dissertation (Univ. Illinois, Urbana-Champaign, Champaign, IL, USA, 2010).

    Google Scholar 

  11. H. Yu et al., “Topology mapping for Blue Gene/L supercomputer,” in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (ACM, 2006), p. 116.

    Google Scholar 

  12. P. Balaji et al., “Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems,” Comput. Sci.-Res. Developm. 26, 247–256 (2011).

    Article  Google Scholar 

  13. G. Karypis and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM J. Sci. Comput. 20, 359–392 (1999).

    Article  MathSciNet  MATH  Google Scholar 

  14. B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” The Bell Syst. Tech. J. (1970).

    Google Scholar 

  15. H. D. Simon and S.-H. Teng, “How good is recursive bisection?,” SIAM J. Sci. Comput. 18, 1436–1445 (1997).

    Article  MathSciNet  MATH  Google Scholar 

  16. T. Rauber and G. Runger, Parallel Programming for Multicore and Cluster Systems (Springer, Berlin, Heidelberg, 2013).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. R. Khalilov.

Additional information

(Submitted by V. V. Voevodin)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khalilov, M.R., Timofeev, A.V. Optimization of MPI-Process Mapping for Clusters with Angara Interconnect. Lobachevskii J Math 39, 1188–1198 (2018). https://doi.org/10.1134/S1995080218090111

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S1995080218090111

Keywords and phrases

Navigation