Abstract
This paper describes a comparative performance study of MPI and Remote Memory Access (RMA) communication models in context of four scientific benchmarks: NAS MG, NAS CG, SUMMA matrix multiplication, and Lennard Jones molecular dynamics on clusters with the Myrinet network. It is shown that RMA communication delivers a consistent performance advantage over MPI. In some cases an improvement as much as 50% was achieved. Benefits of using non-blocking RMA for overlapping computation and communication are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bariuso, R., Knies, A.: SHMEM’s User’s Guide, Cray Research, SN-2516 (1994)
Luecke, G.R., Spanoyannis, S., Kraeva, M.: Comparing the Performance and Scalability of SHMEM and MPI-2 One-Sided Routines on a SGI Origin 2000 and a Cray T3E-600, J. PEMCS (December 2002)
Nieplocha, J., Tipparaju, V., Saify, A., Panda, D.: Protocols and Strategies for Optimizing Remote Memory Operations on Clusters. In: Proc. Communication Architecture for Clusters Workshop of IPDPS 2002 (2002)
Nieplocha, J., Tipparaju, V., Ju, J., Apra, E.: One-sided communication on Myrinet. Cluster Computing 6, 115–124 (2003)
Nieplocha, J., Carpenter, B.: ARMCI: A Portable Remote Memory Copy Library for Distributed Array Libraries and Compiler Run-time Systems. In: Proc. RTSPP IPPS/SDP 1999 (1999)
Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global Arrays: A portable s̀hared-memory’ programming model for distributed memory computers. In: Proc. Supercomputing 1994 (1994)
Numrich, R., Reid, J.K.: Co-Array Fortran for parallel programming. ACM Fortran Forum 17(2), 1–31 (1998)
Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification. Tech Report CCS-TR-99-157, Center for Computing Sciences (1999)
Parzyszek, K., Nieplocha, J., Kendall, R.: A Generalized Portable SHMEM Library for High Performance Computing. In: Proc PDCS 2000 (2000)
Myricom, The GM Message Passing System, October 16 (1999)
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks. Tech. Rep. RNR-94-007, NASA Ames Research Center (March 1994)
Van de Geijn, R., Watts, J.: SUMMA: Scalable Universal Matrix Multiplication Algorithm. Concurrency: Practice and Experience 9, 255–274 (1997)
Plimpton, S.J.: Fast Parallel Algorithms for Short-Range Molecular Dynamics. J. Comp. Phys. 117, 1–19 (1995)
Plimpton, S.J.: Scalable Parallel Molecular Dynamics on MIMD supercomputers. In: Proceedings of Scalable High Performance Computing Conference 1992 (1992)
Esselink, K., Smit, B., Hilbers, P.A.J.: Efficient Parallel Implementation of Molecular Dynamics on a Toroidal Network: I. Parallelizing strategy. J. Comp. Phys. 106, 101–107 (1993)
Kruskal, C., Weiss, A.: Allocating Independent Subtasks on Parallel Processors. IEEE Transactions on Software Engineering SE-11(10) (1985)
Numrich, R.W., Reid, J., Kim, K.: Writing a multigrid solver using Coarray Fortran. In: Proceedings of the Fourth International Workshop on Applied Parallel Computing, Umea, Sweden (June 1998)
Shan, H., Singh, J.P., Biswas, R., Oliker, L.: A Comparison of Three Programming Models for Adaptive Applications on the Origin 2000. In: Proc. SC 2000 (2000)
Chamberlain, B.L., Deitz, S.J., Snyder, L.: A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures. In: SC 2000 (2000)
Baden, S.B., Fink, S.J.: Communication overlap in multi-tier parallel algorithms. In: Conf. Proc. SC 1998, Orlando FL (November 1998)
Center for Programming Models for Scalable Parallel Computing, http://www.pmodels.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tipparaju, V., Krishnan, M., Nieplocha, J., Santhanaraman, G., Panda, D. (2003). Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks. In: Pinkston, T.M., Prasanna, V.K. (eds) High Performance Computing - HiPC 2003. HiPC 2003. Lecture Notes in Computer Science, vol 2913. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24596-4_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-24596-4_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20626-2
Online ISBN: 978-3-540-24596-4
eBook Packages: Springer Book Archive