Experiences Using OpenMP Based on Compiler Directed Software DSM on a PC Cluster
In this work we report on our experiences running OpenMP programs on a commodity cluster of PCs running a software distributed shared memory (DSM) system. We describe our test environment and report on the performance of a subset of the NAS Parallel Benchmarks that have been automatically parallelized for OpenMP. We compare the performance of the OpenMP implementations with that of their message passing counterparts and discuss performance differences.
KeywordsShared Memory Message Passing Home Node Distribute Shared Memory Shared Memory System
Unable to display preview. Download preview PDF.
- 2.C. Amza, A. L. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. TreadMarks: Shared Memory Computing on Networks of Workstations. IEEE Computer, 29(2):18–28, February 1996.Google Scholar
- 3.D. Bailey, J. Barton, T Lasinski, and H. Simon. The NAS Parallel Benchmarks. Technical Report RNR-91-002, NASA Ames Research Center, Moffett Field, CA, 1991.Google Scholar
- 5.Phillip Ezolt. A Study in Malloc: A Case of Excessive Minor Faults. In Proceedings of the 5 th Annual Linux Showcase & Conference, November 5–10, 2001.Google Scholar
- 6.H. Harada, Y. Ishikawa, A. Hori, H. Tezuka, S. Sumimoto, and T. Takahashi. Dynamic Home Node Reallocation on Software Distributed Shared Memory. In Proceedings of HPC Asia 2000, Beijing, China, pages 158–163, May 2000.Google Scholar
- 7.Y. C. Hu, H. Lu A. L. Cox, and W. Zwaenepoel. OpenMP for Networks of SMPs. In Proceedings of the Thirteenth International Parallel Processing Symposium, pages 302–310, 1999.Google Scholar
- 9.H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementations of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011, NAS, 1999.Google Scholar
- 10.H. Jin, M. Frumkin, and J. Yan. Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes. In Proceedings of Third International Symposium on High Performance Computing (ISHPC2000), Tokyo, Japan, October 16–18, 2000.Google Scholar
- 12.MPI 1.1 Standard. http://www-unix.mcs.anl.gov/mpi/mpich.
- 13.Omni OpenMP and SCASH. http://www.pccluster.org.
- 14.OpenMP Fortran Application Program Interface. http://www.openmp.org.
- 15.D. Scales, K. Gharachorloo, and A. Aggarwal. Finegran software distributed shared memory on SMP clusters. In Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, pages 125–136, February 1998.Google Scholar
- 16.H. Shan and J. Pal Singh. A comparison of MPI, SHMEM, and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessor. International Journal of Parallel Programming, 29(3), 2001.Google Scholar
- 18.R. Stets, S. Dwarkadas, N. Hardavellas, G. Hunt, L. Kontothanassis, S. Parthasarathy, and M. Scott. Cashmere-2L:Software coherent shared memory on a clustered remote write network. In Proceedings of the 16th ACM Symposium on Operating System Principles, pages 170–183, October 1997.Google Scholar
- 19.K. Taura, S. Matsuoka, and A. Yonezawa. StackThreads: An abstract machine for scheduling fine-grain threads on stock CPUs. In Proceedings of Workshop on Theory and Practice of Parallel Programming, pages 121–136, 1994.Google Scholar
- 20.H. Tezuka, A. Hori, and Y. Ishikawa. Design and Implementation of PM: a Communication Library for Workstation Cluster. In JSPP’96, IPSJ, pages 41–48, June 1996. (In Japanese).Google Scholar
- 21.H. Tezuka, A. Hori, and Y. Ishikawa. PM: A High-Performance Communication Library for Multi-user Parallel Environments. Technical Report TR-96015, RWC, November 1996.Google Scholar
- 22.H. Tezuka, F. O’Carroll, A. Hori, and Y. Ishikawa. Pin-down Cache: A Virtual Memory Managment Technique for Zero-copy Communication. Technical Report TR 97006, Tsukuba Research Center, Real World Computing Partnership, 1997.Google Scholar