Skip to main content

FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis

  • Conference paper
  • First Online:
OpenMP: Portable Multi-Level Parallelism on Modern Systems (IWOMP 2020)

Abstract

Compilers optimize OpenMP programs differently than their serial elision. Early outlining of parallel regions and invocation of parallel code via OpenMP runtime functions are two of the most profound differences. Understanding the interplay between compiler optimizations, OpenMP compilation, and application performance is hard and usually requires specialized benchmarks and compilation analysis tools.

To this end, we present FAROS, an extensible framework to automate and structure the analysis of compiler optimization of OpenMP programs. FAROS provides a generic configuration interface to profile and analyze OpenMP applications with their native build configurations. Using FAROS on a set of 39 OpenMP programs, including HPC applications and kernels, we show that OpenMP compilation hinders optimization for the majority of programs. Comparing single-threaded OpenMP execution to its sequential counterpart, we observed slowdowns as much as 135.23%. In some cases, however, OpenMP compilation speeds up execution as much as 25.48% when OpenMP semantics help compiler optimization. Following analysis on compiler optimization reports enables us to pinpoint the reasons without in-depth knowledge of the compiler. The information can be used to improve compilers and also to bring performance on par through manual code refactoring.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Faros is a transliteration of the greek word , which means lighthouse or beacon, in an analogy to our framework set to guide the analysis of OpenMP compilation.

  2. 2.

    https://github.com/ggeorgakoudis/FAROS.

  3. 3.

    https://llvm.org/docs/Remarks.html.

  4. 4.

    Simplified for the sake of brevity.

  5. 5.

    https://llvm.org/docs/TestingGuide.html.

  6. 6.

    https://gcc.gnu.org/onlinedocs/gccint/Testsuites.html.

References

  1. Abraham, M.J., et al.: GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015). https://doi.org/10.1016/j.softx.2015.06.001. http://www.sciencedirect.com/science/article/pii/S2352711015000059

  2. Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp. 22(6), 685–701 (2010)

    Google Scholar 

  3. Bataev, A., Bokhanko, A., Cownie, J.: Towards OpenMP support in LLVM. In: 2013 European LLVM Conference (2013)

    Google Scholar 

  4. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1454115.1454128

  5. Bronevetsky, G., Gyllenhaal, J., de Supinski, B.R.: CLOMP: accurately characterizing OpenMP application overheads. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 13–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_2

    Chapter  Google Scholar 

  6. Bull, J.M., Enright, J.P., Guo, X., Maynard, C., Reid, F.: Performance evaluation of mixed-mode OpenMP/MPI implementations. Int. J. Parallel Program. 38, 396–417 (2010). https://doi.org/10.1007/s10766-010-0137-2

    Article  MATH  Google Scholar 

  7. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC 2009, pp. 44–54. IEEE Computer Society, USA (2009). https://doi.org/10.1109/IISWC.2009.5306797

  8. Cook, J., Finkel, H., Junghams, C., McCorquodale, P., Pavel, R., Richards, D.F.: Proxy app prospectus for ECP application development projects. Office of Scientific and Technical Information (OSTI), October 2017. https://doi.org/10.2172/1477829

  9. Doerfert, J., Diaz, J.M.M., Finkel, H.: The TRegion interface and compiler optimizations for OpenMP target regions. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 153–167. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_11

    Chapter  Google Scholar 

  10. Doerfert, J., Finkel, H.: Compiler optimizations for OpenMP. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 113–127. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_8

    Chapter  Google Scholar 

  11. Doerfert, J., Finkel, H.: Compiler optimizations for parallel programs. In: Hall, M., Sundar, H. (eds.) LCPC 2018. LNCS, vol. 11882, pp. 112–119. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34627-0_9

    Chapter  Google Scholar 

  12. Dongarra, J., Heroux, M.A., Luszczek, P.: High-performance conjugate-gradient benchmark: a new metric for ranking high-performance computing systems. Int. J. High Perform. Comput. Appl. 30(1), 3–10 (2016). https://doi.org/10.1177/1094342015593158

    Article  Google Scholar 

  13. Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: 2009 International Conference on Parallel Processing, pp. 124–131 (2009)

    Google Scholar 

  14. Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exp. 22(6), 702–719 (2010)

    Google Scholar 

  15. Gunow, G., Tramm, J., Forget, B., Smith, K., He, T.: SimpleMOC - a performance abstraction for 3D MOC (2015)

    Google Scholar 

  16. Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical Report SAND2009-5574 3 (2009)

    Google Scholar 

  17. Jin, H., Frumkin, M.A., Yan, J.M.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance (1999)

    Google Scholar 

  18. Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_3

    Chapter  Google Scholar 

  19. Karlin, I., et al.: Exploring traditional and emerging parallel programming models using a proxy application. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 919–932 (2013)

    Google Scholar 

  20. Kunen, A.J., Bailey, T.S., Brown, P.N.: KRIPKE - a massively parallel transport mini-app. Office of Scientific and Technical Information (OSTI), June 2015

    Google Scholar 

  21. Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing Simulation (HPCS), pp. 898–907 (2017)

    Google Scholar 

  22. Mohd-Yusof, J., Swaminarayan, S., Germann, T.C.: Co-design for molecular dynamics: an exascale proxy application. Technical report LA-UR 13-20839 (2013)

    Google Scholar 

  23. Müller, M.S., et al.: SPEC OMP2012—An application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_17

    Chapter  Google Scholar 

  24. Niethammer, C., Gracia, J., Knúpfer, A., Resch, M.M., Nagel, W.E. (eds.): Tools for High Performance Computing 2014. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16012-2

  25. Park, J., Smelyanskiy, M., Yang, U.M., Mudigere, D., Dubey, P.: High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems. In: SC 2015: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)

    Google Scholar 

  26. Richards, D.F., Bleile, R.C., Brantley, P.S., Dawson, S.A., McKinley, M.S., O’Brien, M.J.: Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury. Office of Scientific and Technical Information (OSTI), July 2017

    Google Scholar 

  27. Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). https://doi.org/10.1177/1094342006064482

    Article  Google Scholar 

  28. Tramm, J.R., Siegel, A.R., Forget, B., Josey, C.: Performance analysis of a reduced data movement algorithm for neutron cross section data in Monte Carlo simulations. In: Markidis, S., Laure, E. (eds.) EASC 2014. LNCS, vol. 8759, pp. 39–56. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15976-8_3

    Chapter  Google Scholar 

  29. Tramm, J.R., Siegel, A.R., Islam, T., Schulz, M.: XSBench - the development and verification of a performance abstraction for Monte Carlo reactor analysis. In: PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future. Kyoto (2014). https://www.mcs.anl.gov/papers/P5064-0114.pdf

  30. Trott, C.R., et al.: ASC Trilab L2 Codesign Milestone 2015. Office of Scientific and Technical Information (OSTI), September 2015. https://doi.org/10.2172/1221176

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-810797) and also partially supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgis Georgakoudis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Georgakoudis, G., Doerfert, J., Laguna, I., Scogland, T.R.W. (2020). FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58144-2_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58143-5

  • Online ISBN: 978-3-030-58144-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics