TAMM: A New Topology-Aware Mapping Method for Parallel Applications on the Tianhe-2A Supercomputer

Chen, Xinhai; Liu, Jie; Li, Shengguo; Xie, Peizhen; Chi, Lihua; Wang, Qinglin

doi:10.1007/978-3-030-05051-1_17

Xinhai Chen ORCID: orcid.org/0000-0002-2931-4893¹⁶,
Jie Liu¹⁶,
Shengguo Li¹⁶,
Peizhen Xie¹⁶,
Lihua Chi¹⁷ &
…
Qinglin Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11334))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1625 Accesses
9 Citations

Abstract

With the increasing size of high performance computing systems, the expensive communication overhead between processors has become a key factor leading to the performance bottleneck. However, default process-to-processor mapping strategies do not take into account the topology of the interconnection network, and thus the distance spanned by communication messages may be particularly far. In order to enhance the communication locality, we propose a new topology-aware mapping method called TAMM. By generating an accurate description of the communication pattern and network topology, TAMM employs a two-step optimization strategy to obtain an efficient mapping solution for various parallel applications. This strategy first extracts an appropriate subset of all idle computing resources on the underlying system and then constructs an optimized one-to-one mapping with a refined iterative algorithm. Experimental results demonstrate that TAMM can effectively improve the communication performance on the Tianhe-2A supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bhatele, A., Laxmikant, V.: An evaluative study on the effect of contention on message latencies in large supercomputers. In: 2009 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–8 (2009). https://doi.org/10.1109/IPDPS.2009.5161094
Bhatele, A.: Automating topology aware mapping for supercomputers. Ph.D. thesis, University of Illinois at Urbana-Champaign, Champaign, IL, USA (2010)
Google Scholar
Brandfass, B., Alrutz, T., Gerhold, T.: Rank reordering for mpi communication optimization. Comput. Fluids 80, 372–380 (2013). https://doi.org/10.1016/j.compfluid.2012.01.019
Article Google Scholar
Cao, J., Xiao, L., Pang, Z., Wang, K., Xu, J.: The efficient in-band management for interconnect network in Tianhe-2 system. In: 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), pp. 18–26 (2016). https://doi.org/10.1109/PDP.2016.58
Chen, H., Chen, W., Huang, J., Robert, B., Kuhn, H.: MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters. In: Proceedings of the 20th Annual International Conference on Supercomputing, ICS 2006, pp. 353–360. ACM (2006). https://doi.org/10.1145/1183401.1183451
Duff, I.S.: European exascale software initiative: numerical libraries, solvers and algorithms. In: Alexander, M., et al. (eds.) Euro-Par 2011. LNCS, vol. 7155, pp. 295–304. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29737-3_34
Chapter Google Scholar
Ercal, F., Ramanujam, J., Sadayappan, P.: Task allocation onto a hypercube by recursive mincut bipartitioning. In: Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications: Architecture, Software, Computer Systems, and General Issues, C3P, vol. 1, pp. 210–221. ACM (1988). https://doi.org/10.1145/62297.62323
Fujiwara, T., Malakar, P., Reda, K., Vishwanath, V., Papka, M.E., Ma, K.L.: A visual analytics system for optimizing communications in massively parallel applications. In: IEEE Conference on Visual Analytics Science and Technology (2017)
Google Scholar
Galvez, J.J., Jain, N., Kale, L.V.: Automatic topology mapping of diverse large-scale parallel applications. In: Proceedings of the International Conference on Supercomputing, ICS 2017, pp. 17:1–17:10. ACM (2017). https://doi.org/10.1145/3079079.3079104
Geist, A., Dosanjh, S.: IESP exascale challenge: co-design of architectures and algorithms. Int. J. High Perform. Comput. Appl. 23(4), 401–402 (2009). https://doi.org/10.1177/1094342009347766
Article Google Scholar
Georgiou, Y., Jeannot, E., Mercier, G., Villiermet, A.: Topology-aware job mapping. Int. J. High Perform. Comput. Appl. 63 (2017). https://doi.org/10.1109/SC.2006.63
Hendrickson, B., Leland, R.: The Chaco user’s guide: version 2.0. Technical report, Sandia National Laboratory (1994)
Google Scholar
Hoefler, T., Jeannot, E., Mercier, G.: An overview of topology mapping algorithms and techniques in high-performance computing, Chap. 5, pp. 73–94. Wiley-Blackwell (2014).https://doi.org/10.1002/9781118711897.ch5
Hoefler, T., Snir, M.: Generic topology mapping strategies for large-scale parallel architectures. In: Proceedings of the International Conference on Supercomputing, ICS 2011. pp. 75–84. ACM(2011). https://doi.org/10.1145/1995896.1995909
Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters:algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25(4), 993–1002 (2014). https://doi.org/10.1109/TPDS.2013.104
Article Google Scholar
Jeannot, E., Mercier, G.: Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D’Ambra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 199–210. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15291-7_20
Chapter Google Scholar
Karypis, G., Kumar, V.: Metis: a software package for partitioning unstructured graphs. International Cryogenics Monograph, pp. 121–124 (1998)
Google Scholar
Li, S., Hoefler, T., Snir, M.: NUMA-aware shared-memory collective communication for MPI. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2013, pp. 85–96. ACM (2013). https://doi.org/10.1145/2462902.2462903
Liao, X.K., et al.: High performance interconnect network for Tianhe system. J. Comput. Sci. Technol. 30(2), 259–272 (2015). https://doi.org/10.1007/s11390-015-1520-7
Article Google Scholar
Liao, X., Xiao, L., Yang, C., Lu, Y.: Milkyway-2 supercomputer: system and application. Front. Comput. Sci. 8(3), 345–356 (2014). https://doi.org/10.1007/s11704-014-3501-3
Article MathSciNet Google Scholar
Mercier, G., Clet-Ortega, J.: Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo, M., Westerholm, J., Dongarra, J. (eds.) EuroPVM/MPI 2009. LNCS, vol. 5759, pp. 104–115. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03770-2_17
Chapter Google Scholar
Mirsadeghi, S.H., Afsahi, A.: PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 386–396 (2016). https://doi.org/10.1109/IPDPSW.2016.146
Mirsadeghi, S.H., Afsahi, A.: Topology-aware rank reordering for MPI collectives. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 1759–1768 (2016). https://doi.org/10.1109/IPDPSW.2016.139
Pang, Z., et al.: The TH express high performance interconnect networks. Front. Comput. Sci. 8(3), 357–366 (2014). https://doi.org/10.1007/s11704-014-3500-9
Article MathSciNet Google Scholar
Pellegrini, F., Roman, J.: Scotch: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In: Liddell, H., Colbrook, A., Hertzberger, B., Sloot, P. (eds.) HPCN-Europe 1996. LNCS, vol. 1067, pp. 493–498. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61142-8_588
Chapter Google Scholar
Rodrigues, E.R., Madruga, F.L., Navaux, P.O.A., Panetta, J.: Multi-core aware process mapping and its impact on communication overhead of parallel applications. In: 2009 IEEE Symposium on Computers and Communications, pp. 811–817 (2009). https://doi.org/10.1109/ISCC.2009.5202271
Schreiber, R.S., et al.: The NAS parallel benchmarks. In: 1991 ACM/IEEE Conference on Supercomputing (Supercomputing 1991) (SC), pp. 158–165 (1991). https://doi.org/10.1145/125826.125925
Sreepathi, S., D’Azevedo, E., Philip, B., Worley, P.: Communication characterization and optimization of applications using topology-aware task mapping on large supercomputers. In: Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering, ICPE 2016, pp. 225–236. ACM (2016). https://doi.org/10.1145/2851553.2851575
Subramoni, H., et al.: Design of network topology aware scheduling services for large infiniband clusters. In: 2013 IEEE International Conference on Cluster Computing (CLUSTER), pp. 1–8 (2013). https://doi.org/10.1109/CLUSTER.2013.6702677
Sweep3D: The ASCI Sweep3D Benchmark Code (2014). http://www.llnl.gov/asci-benchmarks/scsi/limited/sweep3d/asci_sweep3d.html (2014)
Tuncer, O., Leung, V.J., Coskun, A.K.: PaCMap: topology mapping of unstructured communication patterns onto non-contiguous allocations. In: Proceedings of the 29th ACM on International Conference on Supercomputing, ICS 2015, pp. 37–46. ACM (2015).https://doi.org/10.1145/2751205.2751225
Walshaw, C., Cross, M.: Jostle: Parallel multilevel graph-partitioning software - an overview. Mesh Partitioning Techniques and Domain Decomposition Techniques (2007)
Google Scholar
Wang, T., Qing, P., Wei, D., Qi, F.B.: Optimization of process-to-core mapping based on clustering analysis. Chin. J. Comput. 38, 1044–1055 (2015)
MathSciNet Google Scholar
Wu, J., Xiong, X., Berrocal, E., Wang, J., Lan, Z.: Topology mapping of irregular parallel applications on torus-connected supercomputers. J. Supercomput. 73(4), 1691–1714 (2017). https://doi.org/10.1007/s11227-016-1876-7
Article Google Scholar
Yu, H., Chung, I.H., Moreira, J.: Topology mapping for blue Gene/L supercomputer. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006. ACM (2006). https://doi.org/10.1145/1188455.1188576
Zerr, R.J., Baker, R.S.: SNAP: SN (discrete ordinates) application proxy - proxy description. Technical report, Los Alamos National Laboratory (2013)
Google Scholar

Download references

Acknowledgment

This research work was supported in part by the National Key Research and Development Program of China (2017YFB0202104), the National Natural Science Foundation of China under Grant No.: 91530324, No.: 91430218, China Postdoctoral Science Foundation (CPSF) Grant No.: 2014M562570, Special Financial Grant from CPSF Grant No.: 2015T81127.

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, 410073, China
Xinhai Chen, Jie Liu, Shengguo Li, Peizhen Xie & Qinglin Wang
Institute of Advanced Science and Technology, Hunan Institute of Traffic Engineering, Hengyang, 421001, China
Lihua Chi

Authors

Xinhai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shengguo Li
View author publications
You can also search for this author in PubMed Google Scholar
Peizhen Xie
View author publications
You can also search for this author in PubMed Google Scholar
Lihua Chi
View author publications
You can also search for this author in PubMed Google Scholar
Qinglin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Liu .

Editor information

Editors and Affiliations

Rutgers University, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, X., Liu, J., Li, S., Xie, P., Chi, L., Wang, Q. (2018). TAMM: A New Topology-Aware Mapping Method for Parallel Applications on the Tianhe-2A Supercomputer. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11334. Springer, Cham. https://doi.org/10.1007/978-3-030-05051-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-05051-1_17
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05050-4
Online ISBN: 978-3-030-05051-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics