Advertisement

A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures

  • Gladys UtreraEmail author
  • Marisa Gil
  • Xavier Martorell
Conference paper
  • 46 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11973)

Abstract

The majority of current HPC applications are composed of complex and irregular data structures that involve techniques such as linear algebra, graph algorithms, and resource management, for which new platforms with varying computation-unit capacity and features are required. Platforms using several cores with different performance characteristics make a challenge the selection of the best programming model, based on the corresponding executing algorithm. To make this study, there are approaches in the literature, that go from comparing in isolation the corresponding programming models’ primitives to the evaluation of a complete set of benchmarks. Our study shows that none of them may provide enough information for a HPC application to make a programming model selection. In addition, modern platforms are modifying the memory hierarchy, evolving to larger shared and private caches or NUMA regions making the memory wall an issue to consider depending on the memory access patterns of applications. In this work, we propose a methodology based on Parallel Programming Patterns to consider intra and inter socket communication. In this sense, we analyze MPI, OpenMP and the hybrid solution MPI/OpenMP in shared-memory environments. We demonstrate that the proposed comparison methodology may give more accurate predictions in performance for given HPC applications and consequently a useful tool to select the appropriate parallel programming model.

Keywords

MPI OpenMP NUMA HPC Parallel programming patterns 

Notes

Acknowledgements

This research was supported by the following grants Spanish Ministry of Science and Innovation (contract TIN2015-65316), the Generalitat de Catalunya (2014-SGR-1051) and the European Commission through the HiPEAC-3 Network of Excellence (FP7/ICT-217068).

References

  1. 1.
    Aldinucci, M., et al.: A parallel pattern for iterative stencil + reduce. J. Supercomput. 74(11), 5690–5705 (2018).  https://doi.org/10.1007/s11227-016-1871-zCrossRefGoogle Scholar
  2. 2.
    Bane, M.K., Keller, R., Pettipher, M., Computing, M., Smith, I.M.D.: A comparison of MPI and OpenMP implementations of a finite element analysis code (2000)Google Scholar
  3. 3.
    Danelutto, M., De Matteis, T., De Sensi, D., Mencagli, G., Torquati, M.: P 3 ARSEC: towards parallel patterns benchmarking. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1582–1589. ACM, New York (2017).  https://doi.org/10.1145/3019612.3019745
  4. 4.
    Kang, S.J., Lee, S.Y., Lee, K.M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. Adv. Multimed. 2015, 361–763 (2015).  https://doi.org/10.1155/2015/575687CrossRefGoogle Scholar
  5. 5.
    Krawezik, G.: Performance comparison of MPI and three OpenMP programming styles on shared memory multiprocessors. In: Proceedings of the Fifteenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA 2003, pp. 118–127. ACM, New York (2003).  https://doi.org/10.1145/777412.777433
  6. 6.
    Metzger, P., Cole, M., Fensch, C.: NUMA optimizations for algorithmic skeletons. In: Aldinucci, M., Padovani, L., Torquati, M. (eds.) Euro-Par 2018. LNCS, vol. 11014, pp. 590–602. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-96983-1_42CrossRefGoogle Scholar
  7. 7.
    MPI: Message Passing Interface (MPI) Forum. http://www.mpi-forum.org/
  8. 8.
  9. 9.
    OpenMP: The OpenMP API specification for parallel programming. http://www.openmp.org/
  10. 10.
    Piotrowski, M.: Mixed mode programming on HPCx. In: Piotrowski, M. (ed.) Parallel Scientific Computing and Optimization. SOIA, vol. 27, pp. 133–143. Springer, New York (2009).  https://doi.org/10.1007/978-0-387-09707-7_12CrossRefGoogle Scholar
  11. 11.
    Qi, L., Shen, M., Chen, Y., Li, J.: Performance comparison between OpenMP and MPI on IA64 architecture. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 388–397. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-24688-6_52CrossRefGoogle Scholar
  12. 12.
    Yan, B., Regueiro, R.A.: Comparison between pure MPI and hybrid MPI-OpenMP parallelism for discrete element method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comput. Part. Mech. 6(2), 271–295 (2019).  https://doi.org/10.1007/s40571-018-0213-8CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Computer Architecture DepartmentUniversitat Politècnica de CatalunyaBarcelonaSpain

Personalised recommendations