Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks

Tipparaju, Vinod; Krishnan, Manojkumar; Nieplocha, Jarek; Santhanaraman, Gopalakrishnan; Panda, Dhabaleswar

doi:10.1007/978-3-540-24596-4_27

Vinod Tipparaju⁶,
Manojkumar Krishnan⁶,
Jarek Nieplocha⁶,
Gopalakrishnan Santhanaraman⁷ &
…
Dhabaleswar Panda⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2913))

Included in the following conference series:

International Conference on High-Performance Computing

415 Accesses
5 Citations

Abstract

This paper describes a comparative performance study of MPI and Remote Memory Access (RMA) communication models in context of four scientific benchmarks: NAS MG, NAS CG, SUMMA matrix multiplication, and Lennard Jones molecular dynamics on clusters with the Myrinet network. It is shown that RMA communication delivers a consistent performance advantage over MPI. In some cases an improvement as much as 50% was achieved. Benefits of using non-blocking RMA for overlapping computation and communication are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bariuso, R., Knies, A.: SHMEM’s User’s Guide, Cray Research, SN-2516 (1994)
Google Scholar
Luecke, G.R., Spanoyannis, S., Kraeva, M.: Comparing the Performance and Scalability of SHMEM and MPI-2 One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600, J. PEMCS (December 2002)
Google Scholar
Nieplocha, J., Tipparaju, V., Saify, A., Panda, D.: Protocols and Strategies for Optimizing Remote Memory Operations on Clusters. In: Proc. Communication Architecture for Clusters Workshop of IPDPS 2002 (2002)
Google Scholar
Nieplocha, J., Tipparaju, V., Ju, J., Apra, E.: One-sided communication on Myrinet. Cluster Computing 6, 115–124 (2003)
Article Google Scholar
Nieplocha, J., Carpenter, B.: ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems. In: Proc. RTSPP IPPS/SDP 1999 (1999)
Google Scholar
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global Arrays: A portable s̀hared-memory’ programming model for distributed memory computers. In: Proc. Supercomputing 1994 (1994)
Google Scholar
Numrich, R., Reid, J.K.: Co-Array Fortran for parallel programming. ACM Fortran Forum 17(2), 1–31 (1998)
Article Google Scholar
Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification. Tech Report CCS-TR-99-157, Center for Computing Sciences (1999)
Google Scholar
Parzyszek, K., Nieplocha, J., Kendall, R.: A Generalized Portable SHMEM Library for High Performance Computing. In: Proc PDCS 2000 (2000)
Google Scholar
Myricom, The GM Message Passing System, October 16 (1999)
Google Scholar
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. Rep. RNR-94-007, NASA Ames Research Center (March 1994)
Google Scholar
Van de Geijn, R., Watts, J.: SUMMA: Scalable Universal Matrix Multiplication Algorithm. Concurrency: Practice and Experience 9, 255–274 (1997)
Article Google Scholar
Plimpton, S.J.: Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comp. Phys. 117, 1–19 (1995)
Article MATH Google Scholar
Plimpton, S.J.: Scalable Parallel Molecular Dynamics on MIMD supercomputers. In: Proceedings of Scalable High Performance Computing Conference 1992 (1992)
Google Scholar
Esselink, K., Smit, B., Hilbers, P.A.J.: Efficient Parallel Implementation of Molecular Dynamics on a Toroidal Network: I. Parallelizing strategy. J. Comp. Phys. 106, 101–107 (1993)
Article Google Scholar
Kruskal, C., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10) (1985)
Google Scholar
Numrich, R.W., Reid, J., Kim, K.: Writing a multigrid solver using Coarray Fortran. In: Proceedings of the Fourth International Workshop on Applied Parallel Computing, Umea, Sweden (June 1998)
Google Scholar
Shan, H., Singh, J.P., Biswas, R., Oliker, L.: A Comparison of Three Programming Models for Adaptive Applications on the Origin 2000. In: Proc. SC 2000 (2000)
Google Scholar
Chamberlain, B.L., Deitz, S.J., Snyder, L.: A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures. In: SC 2000 (2000)
Google Scholar
Baden, S.B., Fink, S.J.: Communication overlap in multi-tier parallel algorithms. In: Conf. Proc. SC 1998, Orlando FL (November 1998)
Google Scholar
Center for Programming Models for Scalable Parallel Computing, http://www.pmodels.org

Download references

Author information

Authors and Affiliations

Pacific Northwest National Laboratory, Richland, WA, 99352, USA
Vinod Tipparaju, Manojkumar Krishnan & Jarek Nieplocha
Ohio State University, Columbus, OH, 43210, USA
Gopalakrishnan Santhanaraman & Dhabaleswar Panda

Authors

Vinod Tipparaju
View author publications
You can also search for this author in PubMed Google Scholar
Manojkumar Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Jarek Nieplocha
View author publications
You can also search for this author in PubMed Google Scholar
Gopalakrishnan Santhanaraman
View author publications
You can also search for this author in PubMed Google Scholar
Dhabaleswar Panda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Southern California, CA 90089-2562, Los Angeles
Timothy Mark Pinkston
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tipparaju, V., Krishnan, M., Nieplocha, J., Santhanaraman, G., Panda, D. (2003). Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-540-24596-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20626-2
Online ISBN: 978-3-540-24596-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics