Abstract
The efficient implementation of collective communication is a key factor to provide good performance and scalability of communication patterns that involve global data movement and global control. Moreover, this is essential to enhance the fault-tolerance of a parallel computer. For instance, to check the status of the nodes, perform some distributed algorithm to balance the load, synchronize the local clocks, or do performance monitoring. Therefore, the support for multicast communications can improve the performance and resource utilization of a parallel computer. The Quadrics interconnect (QsNET), which is being used in some of the largest machines in the world, provides hardware support for multicast. The basic mechanism consists of the capability for a message to be sent to any set of contiguous nodes in the same time it takes to send a unicast message. The two main collective communication primitives provided by the network software are the barrier synchronization and the broadcast, which are both implemented in two different ways, either using the hardware support, when nodes are contiguous, or a balanced tree and unicast messaging, otherwise. In this paper some performance results are given for the above collective communication services, that show, on the one hand, the outstanding performance of the hardware-based primitives even in the presence of a high network background traffic; and, on the other hand, the limited performance achieved with the software-based implementation.
The work was supported by the Spanish CICYT through contract TIC2000–1151–C07–05
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrea Carol Arpaci-Dusseau. Implicit coscheduling: coordinated scheduling with implicit information in distributed systems. ACM Transactions on Computer Systems, 19(3):283–331, 2001.
G. Bell. Ultracomputer: a Teraflop before its time. Communications of the ACM, 35(8):27–47, 1992.
Nanette J. Boden, Danny Cohen, Robert E. Felderman, Alan E. Kulawick, Charles L. Seitz, Jakov N. Seizovic, Wen-King Su. Myrinet: A Gigabit-per-Second Local Area Network. IEEE Micro, 15(l):29–36, January 1995.
Darius Buntinas, Dhabaieswar Panda, P. Sadayappan. Performance Benefits of NIC-Based Barrier on Myrinet/GM. In Workshop on Communication Architecture for Clusters (CAC’01), San Francisco, CA, April 2001.
José Duato, Sudhakar Yalamanchili, Lionel Ni. Interconnection Networks: an Engineering Approach. IEEE Computer Society Press, 1997.
Fabrizio Petrini, Wu-chun Feng. Buffered Coscheduling: A New Methodology for Multitasking Parallel Jobs on Distributed Systems. In Proceedings of the International Parallel and Distributed Processing Symposium 2000, IPDPS2000, Cancun, MX, May 2000.
Dror G. Feitelson, Morris A. Jette. Improved Utilization and Responsiveness with Gang Scheduling. In Dror G. Feitelson and Larry Rudolph (Eds.), Job Scheduling Strategies for Parallel Processing, volume 1291 of Lecture Notes in Computer Science. Springer-Verlag, 1997.
Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Scott Pakin, Salvador Coll. Storm: Lightning-fast resource management. In IEEE/ACM SC2001, Baltimore, MD, November 2002.
Barton P. Miller, Mark D. Callaghan, Jonathan M. Cargille, Jeffrey K. Hollingsworth, Karen L. Karavanic R. Bruce Irvin, Krishna Kunchithapadam, Tia Newhall. The Paradyn Parallel Performance Measurement Tool. IEEE Computer, 28(11):37–46, November 1995.
Fabrizio Petrini. Scaling to Thousands of Processors with Buffered Coscheduling. In Scaling to New Heights Workshop, Pittsburgh, PA, May 2002.
Fabrizio Petrini, Wu chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg. The Quadrics Network: High Performance Clustering Technology. IEEE Micro, 22(l):46–57, January-February 2002.
Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie. Hardware- and Software-Based Collective Communication on the Quadrics Network. In IEEE International Symposium on Network Computing and Applications 2001 (NCA 2001), Boston, MA, October 2001.
Randy L. Ribler, Jeffrey S. Vetter, Huseyin Simitci, Daniel A. Reed. Autopilot: Adaptive Control of Distributed Applications. In 7th IEEE Symposium on High-Performance Distributed Computing, Chicago, IL, July 1998.
Rajeev Sivaram, Dhabaieswar Panda, Craig Stunkel. Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding. In Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing, New Orleans, LA, October 1996.
Rajeev Sivaram, Dhabaieswar Panda, Craig Stunkel. Multicasting in Irregular Networks with Cut-Through Switches using Tree-Based Multidestination Worms. In Parallel Computing, Routing, and Communication Workshop, PCRCW’97, Atlanta, GA, June 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media New York
About this chapter
Cite this chapter
Coll, S., Duato, J., Mora, F.J., Petrini, F., Hoisie, A. (2004). Collective Communication Patterns on the Quadrics Network. In: Getov, V., Gerndt, M., Hoisie, A., Malony, A., Miller, B. (eds) Performance Analysis and Grid Computing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0361-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0361-3_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5038-5
Online ISBN: 978-1-4615-0361-3
eBook Packages: Springer Book Archive