Netloc: A Tool for Topology-Aware Process Mapping

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)

Abstract

Interconnection networks in parallel platforms can be made of thousands of nodes and hundreds of switches. The communication cost between tasks of a parallel application varies significantly with their actual location in such platforms. Topology-aware process mapping consists in matching the application communication pattern with the network topology to improve the communication cost by placing related tasks close on the hardware.

We show that our Netloc tool for gathering network topology in a generic way can be combined with the state-of-the-art Scotch partitioner for computing topology-aware MPI process placement. Our experiments with a stencil application on a fat-tree machine show that we are able to significantly improve the runtime in the vast majority of cases.

Keywords

Topology-aware mapping Network topology Process placement MPI 

Notes

Acknowledgements

Experiments presented in this paper were carried out using the PLAFRIM experimental testbed, being developed under the Inria PlaFRIM development action with support from Bordeaux INP, LABRI and IMB and other entities: Conseil Régional d’Aquitaine, Université de Bordeaux and CNRS and ANR in accordance to the Programme d’Investissements d’Avenir (see https://www.plafrim.fr/).

This work is partially funded under the ITEA3 COLOC project #13024.

References

  1. 1.
    Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray XC series network. White Paper WP-Aries01-1112, Cray Inc. (2012)Google Scholar
  2. 2.
    Alverson, R., Roweth, D., Kaplan, L.: The Gemini system interconnect. In: 2010 18th IEEE Symposium on High Performance Interconnects, pp. 83–87, August 2010Google Scholar
  3. 3.
    Barrett, R.F., Vaughan, C.T., Heroux, M.A.: MiniGhost: a miniapp for exploring boundary exchange strategies using stencil computations in scientific parallel computing. Technical report SAND 5294832, Sandia National Laboratories (2011)Google Scholar
  4. 4.
    Bosilca, G., Foyer, C., Jeannot, E., Mercier, G., Papauré, G.: Online dynamic monitoring of MPI communications. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 49–62. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-64203-1_4. Extended version in https://hal.inria.fr/hal-01485243 CrossRefGoogle Scholar
  5. 5.
    Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: A generic framework for managing hardware affinities in HPC applications. In: Proceedings of 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), pp. 180–186. IEEE Computer Society Press, Pisa, Italia, February 2010. http://hal.inria.fr/inria-00429889
  6. 6.
    Goglin, B., Hursey, J., Squyres, J.M.: netloc: towards a comprehensive view of the HPC system topology. In: Proceedings of 5th International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2014), held in conjunction with ICPP-2014, Minneapolis, MN, pp. 216–225, September 2014. http://hal.inria.fr/hal-01010599
  7. 7.
    Hoefler, T., Snir, M.: Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of 2011 ACM International Conference on Supercomputing (ICS 2011), pp. 75–85. ACM, June 2011Google Scholar
  8. 8.
    Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters: algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014)CrossRefGoogle Scholar
  9. 9.
    Li, S., Hoefler, T., Snir, M.: NUMA-aware shared-memory collective communication for MPI. In: Proceedings of 22nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2013, pp. 85–96. ACM (2013)Google Scholar
  10. 10.
    Pellegrini, F., Roman, J.: Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds.) HPCN-Europe 1996. LNCS, vol. 1067, pp. 493–498. Springer, Heidelberg (1996).  https://doi.org/10.1007/3-540-61142-8_588 CrossRefGoogle Scholar
  11. 11.
    Solernou, A., Thiyagalingam, J., Duta, M.C., Trefethen, A.E.: The effect of topology-aware process and thread placement on performance and energy. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 357–371. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38750-0_27 CrossRefGoogle Scholar
  12. 12.
    Subramoni, H., Potluri, S., Kandalla, K., Barth, B., Vienne, J., Keasler, J., Tomko, K., Schulz, K., Moody, A., Panda, D.K.: Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes. In: Proceedings of 2012 ACM/IEEE Conference on Supercomputing, Salt Lake City, UT, November 2012Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Inria Bordeaux, Sud-Ouest, LaBRI, University of BordeauxBordeauxFrance

Personalised recommendations