A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures
- 46 Downloads
The majority of current HPC applications are composed of complex and irregular data structures that involve techniques such as linear algebra, graph algorithms, and resource management, for which new platforms with varying computation-unit capacity and features are required. Platforms using several cores with different performance characteristics make a challenge the selection of the best programming model, based on the corresponding executing algorithm. To make this study, there are approaches in the literature, that go from comparing in isolation the corresponding programming models’ primitives to the evaluation of a complete set of benchmarks. Our study shows that none of them may provide enough information for a HPC application to make a programming model selection. In addition, modern platforms are modifying the memory hierarchy, evolving to larger shared and private caches or NUMA regions making the memory wall an issue to consider depending on the memory access patterns of applications. In this work, we propose a methodology based on Parallel Programming Patterns to consider intra and inter socket communication. In this sense, we analyze MPI, OpenMP and the hybrid solution MPI/OpenMP in shared-memory environments. We demonstrate that the proposed comparison methodology may give more accurate predictions in performance for given HPC applications and consequently a useful tool to select the appropriate parallel programming model.
KeywordsMPI OpenMP NUMA HPC Parallel programming patterns
This research was supported by the following grants Spanish Ministry of Science and Innovation (contract TIN2015-65316), the Generalitat de Catalunya (2014-SGR-1051) and the European Commission through the HiPEAC-3 Network of Excellence (FP7/ICT-217068).
- 2.Bane, M.K., Keller, R., Pettipher, M., Computing, M., Smith, I.M.D.: A comparison of MPI and OpenMP implementations of a finite element analysis code (2000)Google Scholar
- 3.Danelutto, M., De Matteis, T., De Sensi, D., Mencagli, G., Torquati, M.: P 3 ARSEC: towards parallel patterns benchmarking. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1582–1589. ACM, New York (2017). https://doi.org/10.1145/3019612.3019745
- 5.Krawezik, G.: Performance comparison of MPI and three OpenMP programming styles on shared memory multiprocessors. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2003, pp. 118–127. ACM, New York (2003). https://doi.org/10.1145/777412.777433
- 7.MPI: Message Passing Interface (MPI) Forum. http://www.mpi-forum.org/
- 8.NORD3: Nord3 machine. https://www.bsc.es/support/Nord3-ug.pdf
- 9.OpenMP: The OpenMP API specification for parallel programming. http://www.openmp.org/
- 11.Qi, L., Shen, M., Chen, Y., Li, J.: Performance comparison between OpenMP and MPI on IA64 architecture. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 388–397. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24688-6_52CrossRefGoogle Scholar