Skip to main content

Comparison of Parallel Programming Models on Clusters of SMP Nodes

  • Conference paper
Book cover Modeling, Simulation and Optimization of Complex Processes

Summary

Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node interconnect with the shared memory parallelization inside of each node. Various hybrid MPI+OpenMP programming models are compared with pure MPI. Benchmark results of several platforms are presented. This paper analyzes the strength and weakness of several parallel programming models on clusters of SMP nodes. There are several mismatch problems between the (hybrid) programming schemes and the hybrid hardware architectures. Benchmark results on a Myrinet cluster and on recent Cray, NEC, IBM, Hitachi, SUN and SGI platforms show, that the hybrid-masteronly programming model can be used more efficiently on some vector-type systems, but also on clusters of dual-CPUs. On other systems, one CPU is not able to saturate the inter-node network and the commonly used masteronly programming model suffers from insufficient inter-node bandwidth. This paper analyses strategies to overcome typical drawbacks of this easily usable programming scheme on systems with weaker inter-connects. Best performance can be achieved with overlapping communication and computation, but this scheme is lacking in ease of use.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Rudolf Berrendorf, Michael Gerndt, Wolfgang E. Nagel and Joachim Prumerr, SVM Fortran, Technical Report IB-9322, KFA Jülich, Germany, 1993. www.fz-juelich.de/zam/docs/printable/ib/ib-93/ib-9322.ps

    Google Scholar 

  2. Frank Cappello and Daniel Etiemble, MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks, in Proc. Supercomputing'00, Dallas, TX, 2000. http://citeseer.nj.nec.com/cappello00mpi.html

    Google Scholar 

  3. The Earth Simulator. www.es.jamstec.go.jp

    Google Scholar 

  4. William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum, A high-performance, portable implementation of the MPI message passing interface standard, in Parallel Computing 22–6, Sep. 1996, pp 789–828. http://citeseer.nj.nec.com/gropp96highperformance.html

    Google Scholar 

  5. Shinichi Habataa, Mitsuo Yokokawa, and Shigemune Kitawaki, The Earth Simulator System, in NEC Research & Development, Vol. 44, No. 1, Jan. 2003, Special Issue on High Performance Computing.

    Google Scholar 

  6. Jonathan Harris, Extending OpenMP for NUMA Architectures, in proceedings of the Second European Workshop on OpenMP, EWOMP 2000. www.epcc.ed.ac.uk/ewomp2000/proceedings.html

    Google Scholar 

  7. D. S. Henty, Performance of hybrid message-passing and shared-memory parallelism for discrete element modeling, in Proc. Supercomputing'00, Dallas, TX, 2000. http://citeseer.nj.nec.com/henty00performance.html www.sc2000.org/techpapr/papers/pap.pap154.pdf

    Google Scholar 

  8. Matthias Hess, Gabriele Jost, Matthias Müller, and Roland Rühle, Experiences using OpenMP based on Compiler Directed Software DSM on a PC Cluster, in WOMPAT2002: Workshop on OpenMP Applications and Tools, Arctic Region Supercomputing Center, University of Alaska, Fairbanks, Aug. 5–7, 2002. http://www.hlrs.de/people/mueller/papers/wompat2002/wompat2002.pdf

    Google Scholar 

  9. Georg Karypis and Vipin Kumar. A parallel algorithm for multilevel graph partitioning and sparse matrix ordering, Journal of Parallel and Distributed Computing, 48(1): 71–95, 1998. http://www-users.cs.umn.edu/~karypis/metis/ http://citeseer.nj.nec.com/karypis98parallel.html

    Article  MathSciNet  Google Scholar 

  10. R. D. Loft, S. J. Thomas, and J. M. Dennis, Terascale spectral element dynamical core for atmospheric general circulation models, in proceedings, SC 2001, Nov. 2001, Denver, USA. www.sc2001.org/papers/pap.pap189.pdf

    Google Scholar 

  11. John Merlin, Distributed OpenMP: Extensions to OpenMP for SMP Clusters, in proceedings of the Second European Workshop on OpenMP, EWOMP 2000. www.epcc.ed.ac.uk/ewomp2000/proceedings.html

    Google Scholar 

  12. Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Rel. 1.1, June 1995, www.mpi-forum.org.

    Google Scholar 

  13. Message Passing Interface Forum. MPI-2: Extensions to the Message-Passing Interface, July 1997, www.mpi-forum.org.

    Google Scholar 

  14. Hans Meuer, Erich Strohmaier, Jack Dongarra, Horst D. Simon, Universities of Mannheim and Tennessee, TOP500 Supercomputer Sites, www.top500.org.

    Google Scholar 

  15. OpenMP Group, www.openmp.org.

    Google Scholar 

  16. Rolf Rabenseifner and Gerhard Wellein, Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures, International Journal of High Performance Computing Applications, Sage Science Press, Vol. 17, No. 1, 2003, pp 49–62.

    Article  Google Scholar 

  17. Rolf Rabenseifner, Hybrid Parallel Programming: Performance Problems and Chances, in proceedings of the 45th CUG Conference 2003, Columbus, Ohio, USA, May 12–16, 2003, www.cug.org.

    Google Scholar 

  18. Mitsuhisa Sato, Shigehisa Satoh, Kazuhiro Kusano, and Yoshio Tanaka, Design of OpenMP Compiler for an SMP Cluster, in proceedings of the 1st European Workshop on OpenMP (EWOMP'99), Lund, Sweden, Sep. 1999, pp 32–39. http://citeseer.nj.nec.com/sato99design.html

    Google Scholar 

  19. A. Scherer, H. Lu, T. Gross, and W. Zwaenepoel, Transparent Adaptive Parallelism on NOWs using OpenMP, in proc. of the Seventh Conference on Principles and Practice of Parallel Programming (PPoPP '99), May 1999, pp 96–106.

    Google Scholar 

  20. Weisong Shi, Weiwu Hu, and Zhimin Tang, Shared Virtual Memory: A Survey, Technical report No. 980005, Center for High Performance Computing, Institute of Computing Technology, Chinese Academy of Sciences, 1998, www.ict.ac.cn/chpc/dsm/tr980005.ps.

    Google Scholar 

  21. Lorna Smith and Mark Bull, Development of Mixed Mode MPI / OpenMP Applications, in proceedings of Workshop on OpenMP Applications and Tools (WOMPAT 2000), San Diego, July 2000. www.cs.uh.edu/wompat2000/

    Google Scholar 

  22. Gerhard Wellein, Georg Hager, Achim Basermann, and Holger Fehske, Fast sparse matrix-vector multiplication for TeraFlop/s computers, in proceedings of VECPAR'2002, 5th Int'l Conference on High Performance Computing and Computational Science, Porto, Portugal, June 26–28, 2002, part I, pp 57–70. http://vecpar.fe.up.pt/

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rabenseifner, R., Wellein, G. (2005). Comparison of Parallel Programming Models on Clusters of SMP Nodes. In: Bock, H.G., Phu, H.X., Kostina, E., Rannacher, R. (eds) Modeling, Simulation and Optimization of Complex Processes. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-27170-8_31

Download citation

Publish with us

Policies and ethics