X-SRQ - Improving Scalability and Performance of Multi-core InfiniBand Clusters

Shipman, Galen M.; Poole, Stephen; Shamis, Pavel; Rabinovitz, Ishai

doi:10.1007/978-3-540-87475-1_11

X-SRQ - Improving Scalability and Performance of Multi-core InfiniBand Clusters

Galen M. Shipman¹,
Stephen Poole¹,
Pavel Shamis² &
…
Ishai Rabinovitz²

Conference paper

885 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5205))

Abstract

To improve the scalability of InfiniBand on large scale clusters Open MPI introduced a protocol known as B-SRQ [2]. This protocol was shown to provide much better memory utilization of send and receive buffers for a wide variety of benchmarks and real-world applications.

Unfortunately B-SRQ increases the number of connections between communicating peers. While addressing one scalability problem of InfiniBand the protocol introduced another. To alleviate the connection scalability problem of the B-SRQ protocol a small enhancement to the reliable connection transport was requested which would allow multiple shared receive queues to be attached to a single reliable connection. This modified reliable connection transport is now known as the extended reliable connection transport.

X-SRQ is a new transport protocol in Open MPI based on B-SRQ which takes advantage of this improvement in connection scalability. This paper introduces the X-SRQ protocol and details the significantly improved scalability of the protocol over B-SRQ and its reduction of the memory footprint of connection state by as much as 2 orders of magnitude on large scale multi-core systems. In addition to improving scalability, performance of latency-sensitive collective operations are improved by up to 38% while significantly decreasing the variability of results. A detailed analysis of the improved memory scalability as well as the improved performance are discussed.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Garbriel, E., Fagg, G., Bosilica, G., Angskun, T., Squyres, J.J.D.J., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R., Daniel, D., Graham, R., Woodall, T.: Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI Users’ Group Meeting (2004)
Google Scholar
Brightwell, R., Maccabe, A.B.: Scalability limitations of VIA-based technologies in supporting MPI. In: Proceedings of the Fourth MPI Devlopers’ and Users’ Conference (2000)
Google Scholar
Shipman, G.M., Woodall, T.S., Graham, R.L., Maccabe, A.B., Bridges, P.G.: Infiniband scalability in Open MPI. In: International Parallel and Distributed Processing Symposium (IPDPS 2006) (2006)
Google Scholar
Koop, M.J., Sur, S., Gao, Q., Panda, D.K.: High performance mpi design using unreliable datagram for ultra-scale infiniband clusters. In: ICS 2007: Proceedings of the 21st annual international conference on Supercomputing, pp. 180–189. ACM, New York (2007)
Chapter Google Scholar
Shipman, G.M., Brightwell, R., Barrett, B., Squyres, J.M., Bloch, G.: Investigations on infiniband: Efficient network buffer utilization at scale. In: Proceedings, Euro PVM/MPI, Paris, France (2007)
Google Scholar
Brightwell, R., Maccabe, A.B., Riesen, R.: Design, implementation, and performance of MPI on Portals 3.0. International Journal of High Performance Computing Applications 17(1) (2003)
Google Scholar
Squyres, J.M., Lumsdaine, A.: The component architecture of open MPI: Enabling third-party collective algorithms. In: Getov, V., Kielmann, T. (eds.) Proceedings, 18th ACM International Conference on Supercomputing, Workshop on Component Models and Systems for Grid Applications, St.Malo, France, pp. 167–185. Springer, Heidelberg (2004)
Google Scholar
Reussner, R., Sanders, P., Prechelt, L., Müller, M.: Skampi: A detailed, accurate mpi benchmark. In: Alexandrov, V.N., Dongarra, J. (eds.) PVM/MPI 1998. LNCS, vol. 1497, pp. 52–59. Springer, Heidelberg (1998)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Oak Ridge National Laboratory, Oak Ridge, TN, USA
Galen M. Shipman & Stephen Poole
Mellanox Technologies, Yokneam, Israel
Pavel Shamis & Ishai Rabinovitz

Authors

Galen M. Shipman
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Poole
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Shamis
View author publications
You can also search for this author in PubMed Google Scholar
Ishai Rabinovitz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexey Lastovetsky Tahar Kechadi Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shipman, G.M., Poole, S., Shamis, P., Rabinovitz, I. (2008). X-SRQ - Improving Scalability and Performance of Multi-core InfiniBand Clusters. In: Lastovetsky, A., Kechadi, T., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2008. Lecture Notes in Computer Science, vol 5205. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87475-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-540-87475-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87474-4
Online ISBN: 978-3-540-87475-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics