Advertisement

Modeling UGAL on the Dragonfly Topology

  • Md Atiqul MollahEmail author
  • Peyman Faizian
  • Md Shafayat Rahman
  • Xin Yuan
  • Scott Pakin
  • Michael Lang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10724)

Abstract

The Dragonfly topology has been proposed and deployed as the interconnection network topology for next-generation supercomputers. Practical routing algorithms developed for Dragonfly are based on a routing scheme called Universal Globally Adaptive Load-balanced routing with Global information (UGAL-G). While UGAL-G and UGAL-based practical routing schemes have been extensively studied, all existing results are based on simulation or measurement. There is no theoretical understanding of how the UGAL-based routing schemes achieve their performance on a particular network configuration as well as what the routing schemes optimize for. In this work, we develop and validate throughput models for UGAL-G on the Dragonfly topology and identify a robust model that is both accurate and efficient across many Dragonfly variations. Given a traffic pattern, the proposed models estimate the aggregate throughput for the pattern accurately and effectively. Our results not only provide a mechanism to predict the communication performance for large scale Dragonfly networks but also reveal the inner working of UGAL-G, which furthers our understanding of UGAL-based routing on Dragonfly.

References

  1. 1.
    Kim, J., Dally, W.J., Scott, S., Abts, D.: Technology-driven, highly-scalable dragonfly topology. In: ACM SIGARCH Computer Architecture News, vol. 36, pp. 77–88. IEEE Computer Society (2008)Google Scholar
  2. 2.
    Faanes, G., Bataineh, A., Roweth, D., Froese, E., Alverson, B., Johnson, T., Kopnick, J., Higgins, M., Reinhard, J., et al.: Cray cascade: a scalable HPC system based on a dragonfly network. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 103. IEEE Computer Society Press (2012)Google Scholar
  3. 3.
  4. 4.
    Archer, B.J., Vigil, M.: The trinity system. In: Nuclear Explosive Code Development Conference (NECDC), Los Alamos, New Mexico, 20–24 October 2014. Also appears as Los Alamos Technical Report LA-UR-15-20221Google Scholar
  5. 5.
    Singh, A.: Load-balanced routing. In: Interconnection Networks. Ph.D. thesis, Stanford University (2005)Google Scholar
  6. 6.
    Jiang, N., Kim, J., Dally, W.J.: Indirect adaptive routing on large scale interconnection networks. SIGARCH Comput. Archit. News 37(3), 220–231 (2009)CrossRefGoogle Scholar
  7. 7.
    Open networking foundation. Sdn architecture. White Paper, ONF TR-502, June 2014. https://www.opennetworking.org/images/stories/downloads/sdn-resources/technical-reports/TR_SDN_ARCH_1.0_06062014.pdf
  8. 8.
    Shahrokhi, F., Matula, D.W.: The maximum concurrent flow problem. J. ACM 37(2), 318–334 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Jyothi, S.A., Singla, A., Godfrey, P.B., Kolla, A.: Measuring and understanding throughput of network topologies. In: The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2016), November 2016Google Scholar
  10. 10.
    Singla, A., Godfrey, P.B., Kolla, A.: High throughput data center topology design. In: 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI), April 2014Google Scholar
  11. 11.
    Faizian, P., Mollah, M.A., Yuan, X., Pakin, S., Lang, M.: Random regular graph and generalized De Bruijn graph with k-shortest path routing. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 103–112, May 2016Google Scholar
  12. 12.
    Jiang, N., Balfour, J., Becker, D.U., Towles, B., Dally, W.J., Michelogiannakis, G., Kim, J.: A detailed and flexible cycle-accurate network-on-chip simulator. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 86–96, April 2013Google Scholar
  13. 13.
  14. 14.
    Valiant, L.G.: A scheme for fast parallel communication. SIAM J. Comput. 11(2), 350–361 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Garcia, M., Vallejo, E., Beivide, R., Odriozola, M., Camarero, C., Valero, M., Rodríguez, G., Labarta, J., Minkenberg, C.: On-the-fly adaptive routing in high-radix hierarchical networks. In: 2012 41st International Conference on Parallel Processing (ICPP), pp. 279–288, September 2012Google Scholar
  16. 16.
  17. 17.
    Garcia, M., Vallejo, E., Beivide, R., Valero, M., Rodríguez, G.: OFAR-CM: efficient dragonfly networks with simple congestion management. In: 2013 IEEE 21st Annual Symposium on High-Performance Interconnects (HOTI), pp. 55–62, August 2013Google Scholar
  18. 18.
    Garcia, M., Vallejo, E., Beivide, R., Odriozola, M., Valero, M.: Efficient routing mechanisms for dragonfly networks. In: 2013 42nd International Conference on Parallel Processing (ICPP), pp. 582–592, October 2013Google Scholar
  19. 19.
    Won, J., Kim, G., Kim, J., Jiang, T., Parker, M., Scott, S.: Overcoming far-end congestion in large-scale networks. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 415–427, February 2015Google Scholar
  20. 20.
    Fuentes, P., Vallejo, E., Garcia, M., Beivide, R., Rodríguez, G., Minkenberg, C., Valero, M.: Contention-based nonminimal adaptive routing in high-radix networks. In: 2015 IEEE International Conference on Parallel and Distributed Processing Symposium (IPDPS), pp. 103–112, May 2015Google Scholar
  21. 21.
    Jain, N., Bhatele, A., Ni, X., Wright, N.J., Kale, L.V.: Maximizing throughput on a dragonfly network. In: SC14: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 336–347, November 2014Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Md Atiqul Mollah
    • 1
    Email author
  • Peyman Faizian
    • 1
  • Md Shafayat Rahman
    • 1
  • Xin Yuan
    • 1
  • Scott Pakin
    • 2
  • Michael Lang
    • 2
  1. 1.Florida State UniversityTallahasseeUSA
  2. 2.Computer, Computational, and Statistical Sciences DivisionLos Alamos National LaboratoryLos AlamosUSA

Personalised recommendations