OpenMP as a High-Level Specification Language for Parallelism

And its use in Evaluating Parallel Programming Systems
  • Max GrossmanEmail author
  • Jun Shirako
  • Vivek Sarkar
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9903)


While OpenMP is the de facto standard of shared memory parallel programming models, a number of alternative programming models and runtime systems have arisen in recent years. Fairly evaluating these programming systems can be challenging and can require significant manual effort on the part of researchers. However, it is important to facilitate these comparisons as a way of advancing both the available OpenMP runtimes and the research being done with these novel programming systems.

In this paper we present the OpenMP-to-X framework, an open source tool for mapping OpenMP constructs and APIs to other parallel programming systems. We apply OpenMP-to-X to the HClib parallel programming library, and use it to enable a fair and objective comparison of performance and programmability among HClib, GNU OpenMP, and Intel OpenMP. We use this investigation to expose performance bottlenecks in both the Intel OpenMP and HClib runtimes, to motivate improvements to the HClib programming model and runtime, and to propose potential extensions to the OpenMP standard. Our performance analysis shows that, across a wide range of benchmarks, HClib demonstrates significantly less volatility in its performance with a median standard deviation of 1.03 % in execution times and outperforms the two OpenMP implementations on 15 out of 24 benchmarks.


Programming Model Parallel Programming Work Thread Percent Standard Deviation Parallel Programming Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the Data Analysis and Visualization Cyberinfrastructure funded by NSF under grant OCI-0959097 and Rice University.

The authors would also like to acknowledge the contributions of Vivek Kumar, Nick Vrvilo, and Vincent Cave to the HClib project.


  1. 1.
  2. 2.
    Adhianto, L.: Hpctoolkit: tools for performance analysis of optimized parallel programs. Concur. Comput. Pract. Exp. 22, 685–701 (2010)Google Scholar
  3. 3.
    Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA code generation for affine programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  4. 4.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37(1), 55–69 (1996)CrossRefGoogle Scholar
  5. 5.
    Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-Java: the new adventures of old X10. In: Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, pp. 51–61. ACM (2011)Google Scholar
  6. 6.
    Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Not. 40(10), 519–538 (2005)CrossRefGoogle Scholar
  7. 7.
    Chatterjee, S., Tasirlar, S., Budimlic, Z., Cave, V., Chabbi, M., Grossman, M., Sarkar, V., Yan, Y.: Integrating asynchronous task parallelism with MPI. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 712–725. IEEE (2013)Google Scholar
  8. 8.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014)CrossRefGoogle Scholar
  9. 9.
    Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., DelSignore, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OMPT and OMPD: Openmp tools application programming interfaces for performance analysis and debugging. In: International Workshop on OpenMP (IWOMp 2013) (2013)Google Scholar
  10. 10.
    Habanero Research Group: Hclib: a library implementation of the habanero-c language (2013).
  11. 11.
    Heroux, M.A., Bartlett, R.A., Howle, V.E., Hoekstra, R.J., Hu, J.J., Kolda, T.G., Lehoucq, R.B., Long, K.R., Pawlowski, R.P., Phipps, E.T., et al.: An overview of the trilinos project. ACM Trans. Math. Softw. (TOMS) 31(3), 397–423 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Hornung, R., Keasler, J.: The raja portability layer: overview and status (2014)Google Scholar
  13. 13.
    International Organization for Standardization. The C++ Programming Language Standard (2014).
  14. 14.
    Lee, S., Min, S.-J., Eigenmann, R.: OpenMP to GPGPU: a compiler framework for automatic translation and optimization. ACM Sigplan Not. 44(4), 101–110 (2009)CrossRefGoogle Scholar
  15. 15.
    Ohshima, S., Hirasawa, S., Honda, H.: OMPCUDA : OpenMP execution framework for CUDA based on Omni OpenMP compiler. In: Sato, M., Hanawa, T., Müller, M.S., Chapman, B.M., Supinski, B.R. (eds.) IWOMP 2010. LNCS, vol. 6132, pp. 161–173. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media Inc., Sebastopol (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceRice UniversityHoustonUSA

Personalised recommendations