Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster

  • Matthias Hess
  • Gabriele Jost
  • Matthias Müller
  • Roland Rühle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2716)


In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.


Shared Memory Message Passing Home Node Distribute Shared Memory Shared Memory System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18–28, February 1996.Google Scholar
  3. 3.
    D. Bailey, J. Barton, T Lasinski, and H. Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002, NASA Ames Research Center, Moffett Field, CA, 1991.Google Scholar
  4. 4.
    D. Bailey, T. Harris, W. Saphir, R van der Wijngaart, A. Woo, and M. Yarrow. The NAS Parallel Benchmarks 2.0. Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, 1995. Scholar
  5. 5.
    Phillip Ezolt. A Study in Malloc: A Case of Excessive Minor Faults. In Proceedings of the 5 th Annual Linux Showcase & Conference, November 5–10, 2001.Google Scholar
  6. 6.
    H. Harada, Y. Ishikawa, A. Hori, H. Tezuka, S. Sumimoto, and T. Takahashi. Dynamic Home Node Reallocation on Software Distributed Shared Memory. In Proceedings of HPC Asia 2000, Beijing, China, pages 158–163, May 2000.Google Scholar
  7. 7.
    Y. C. Hu, H. Lu A. L. Cox, and W. Zwaenepoel. OpenMP for Networks of SMPs. In Proceedings of the Thirteenth International Parallel Processing Symposium, pages 302–310, 1999.Google Scholar
  8. 8.
    C. S. Ierotheou, S. P. Johnson, M. Cross, and P. F. Leggett. Computer Aided Parallelisation Tools (CAPTools)-Conceptual Overview and Performance on the Parallelisation of Structured Mesh Codes. Parallel Computing, 22:163–195, 1996.CrossRefGoogle Scholar
  9. 9.
    H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementations of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NAS, 1999.Google Scholar
  10. 10.
    H. Jin, M. Frumkin, and J. Yan. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes. In Proceedings of Third International Symposium on High Performance Computing (ISHPC2000), Tokyo, Japan, October 16–18, 2000.Google Scholar
  11. 11.
    H. Lu, S. Dwarkdadas, A. L. Cox, and W. Zwaenepoel. Quantifying the Performance Differences Between PVM and TreadMarks. Journal of Parallel and Distributed Computation, 43(2):65–78, June 1997.CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Omni OpenMP and SCASH.
  14. 14.
    OpenMP Fortran Application Program Interface.
  15. 15.
    D. Scales, K. Gharachorloo, and A. Aggarwal. Finegran software distributed shared memory on SMP clusters. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 125–136, February 1998.Google Scholar
  16. 16.
    H. Shan and J. Pal Singh. A comparison of MPI, SHMEM, and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessor. International Journal of Parallel Programming, 29(3), 2001.Google Scholar
  17. 17.
    H. Shan and J. Pal Singh. Comparison of Three Programming Models for Adaptive Applications on the Origin 2000. Journal of Parallel and Distributed Computing, 62:241–266, 2002.zbMATHCrossRefGoogle Scholar
  18. 18.
    R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L:Software coherent shared memory on a clustered remote write network. In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 170–183, October 1997.Google Scholar
  19. 19.
    K. Taura, S. Matsuoka, and A. Yonezawa. StackThreads: An abstract machine for scheduling fine-grain threads on stock CPUs. In Proceedings of Workshop on Theory and Practice of Parallel Programming, pages 121–136, 1994.Google Scholar
  20. 20.
    H. Tezuka, A. Hori, and Y. Ishikawa. Design and Implementation of PM: a Communication Library for Workstation Cluster. In JSPP’96, IPSJ, pages 41–48, June 1996. (In Japanese).Google Scholar
  21. 21.
    H. Tezuka, A. Hori, and Y. Ishikawa. PM: A High-Performance Communication Library for Multi-user Parallel Environments. Technical Report TR-96015, RWC, November 1996.Google Scholar
  22. 22.
    H. Tezuka, F. O’Carroll, A. Hori, and Y. Ishikawa. Pin-down Cache: A Virtual Memory Managment Technique for Zero-copy Communication. Technical Report TR 97006, Tsukuba Research Center, Real World Computing Partnership, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Matthias Hess
    • 1
  • Gabriele Jost
    • 2
  • Matthias Müller
    • 1
  • Roland Rühle
    • 1
  1. 1.HLRSStuttgartGermany
  2. 2.NASA Ames Research CenterMoffett FieldUSA

Personalised recommendations