Advertisement

An Efficient Method for Determining Full Point-to-Point Latency of Arbitrary Indirect HPC Networks

  • Chengchun Liu
  • Zhang YangEmail author
  • Limin XiaoEmail author
  • Baicheng Yan
  • Zhihao Wang
  • Hongyun Tian
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11276)

Abstract

Point-to-point latency is one of the most important metrics for high performance computer networks and is used widely in communication performance modeling, link-failure detection, and application optimization. However, it is often hard to determine the full-scale point-to-point latency of large scale HPC networks since it often requires measurements to the square of the number of terminal nodes. In this paper, we propose an efficient method to generate measurement plans for arbitrary indirect HPC networks and reduces the measurement requirements from \(O(n^2)\) to m, which is often O(n) in modern indirect networks containing n nodes and m links, thus significantly reduces the latency measure overhead. Both analysis and experiments show that the proposed method can reduce the overhead of large-scale fat-tree networks by orders of magnitudes.

Notes

Acknowledgement

This work in this paper is supported by the National Key R&D Program of China under Grant NO. 2018YFB0203901, Science Challenge Project, NO. TZ2016002, and the National Natural Science Foundation of China under Grant No. 61772053. The authors would like to thank the reviewers for their valuable comments.

References

  1. 1.
    Alexandrov, A., Ionescu, M.F., Schauser, K.E., Scheiman, C.: LogGP: incorporating long messages into the LogP model. J. Parallel Distrib. Comput. 44, 71–79 (1995)CrossRefGoogle Scholar
  2. 2.
    Ino, F., Fujimoto, N., Hagihara, K.: LogGPS: a parallel computational model for synchronization analysis. ACM SIGPLAN Not. 36, 133–142 (2001)CrossRefGoogle Scholar
  3. 3.
    Bhanot, G., Gara, A., Heidelberger, P., Lawless, E., Sexton, J.C., Walkup, R.: Optimizing task layout on the Blue Gene/L supercomputer. IBM J. Res. Dev. 49, 489–500 (2005)CrossRefGoogle Scholar
  4. 4.
    Szymaniak, M., Presotto, D., Pierre, G., Steen, M.V.: Practical large-scale latency estimation. Comput. Netw. 52, 1343–1364 (2008)CrossRefGoogle Scholar
  5. 5.
    Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. In: Proceedings of the 2nd ACM SIGCOMM Workshop on Internet measurement, no. 2, pp. 137–150 (2002)Google Scholar
  6. 6.
    Liu, J., Zhang, X., Li, B., Zhang, Q., Zhu, W.: Distributed distance measurement for large-scale networks. Comput. Netw. 41, 177–192 (2003)CrossRefGoogle Scholar
  7. 7.
    Guo, C., et al.: Pingmesh: a large-scale system for data center network latency measurement and analysis. In: ACM SIGCOMM Computer Communication Review, vol. 45, pp. 139–152 (2012)CrossRefGoogle Scholar
  8. 8.
    Shavitt, Y., Sun, X., Wool, A., Yener, B.: Computing the unmeasured: an algebraic approach to Internet mapping. IEEE J. Sel. Areas Commun. 22, 67–78 (2004)CrossRefGoogle Scholar
  9. 9.
    Lin, X,Y., Chung, Y,C., Huang, T,Y.: A multiple LID routing scheme for fat-tree-based InfiniBand networks. In: Parallel and Distributed Processing Symposium, 18, p. 11 (2004)Google Scholar
  10. 10.
    Hoefler, T., Schneider, T., Lumsdaine, A.: LogGOPSim: simulating large-scale applications in the LogGOPS model. In: Proceedings of ACM International Symposium on High Performance Distributed Computing, 19, pp. 597–604 (2010)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2018

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringBeihang UniversityBeijingChina
  2. 2.Institute of Applied Physics and Computational MathematicsBeijingChina

Personalised recommendations