FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis

Georgakoudis, Giorgis; Doerfert, Johannes; Laguna, Ignacio; Scogland, Thomas R. W.

doi:10.1007/978-3-030-58144-2_1

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12295))

Included in the following conference series:

International Workshop on OpenMP

682 Accesses
1 Citations

Abstract

Compilers optimize OpenMP programs differently than their serial elision. Early outlining of parallel regions and invocation of parallel code via OpenMP runtime functions are two of the most profound differences. Understanding the interplay between compiler optimizations, OpenMP compilation, and application performance is hard and usually requires specialized benchmarks and compilation analysis tools.

To this end, we present FAROS, an extensible framework to automate and structure the analysis of compiler optimization of OpenMP programs. FAROS provides a generic configuration interface to profile and analyze OpenMP applications with their native build configurations. Using FAROS on a set of 39 OpenMP programs, including HPC applications and kernels, we show that OpenMP compilation hinders optimization for the majority of programs. Comparing single-threaded OpenMP execution to its sequential counterpart, we observed slowdowns as much as 135.23%. In some cases, however, OpenMP compilation speeds up execution as much as 25.48% when OpenMP semantics help compiler optimization. Following analysis on compiler optimization reports enables us to pinpoint the reasons without in-depth knowledge of the compiler. The information can be used to improve compilers and also to bring performance on par through manual code refactoring.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Faros is a transliteration of the greek word , which means lighthouse or beacon, in an analogy to our framework set to guide the analysis of OpenMP compilation.
2.
https://github.com/ggeorgakoudis/FAROS.
3.
https://llvm.org/docs/Remarks.html.
4.
Simplified for the sake of brevity.
5.
https://llvm.org/docs/TestingGuide.html.
6.
https://gcc.gnu.org/onlinedocs/gccint/Testsuites.html.

References

Abraham, M.J., et al.: GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015). https://doi.org/10.1016/j.softx.2015.06.001. http://www.sciencedirect.com/science/article/pii/S2352711015000059
Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp. 22(6), 685–701 (2010)
Google Scholar
Bataev, A., Bokhanko, A., Cownie, J.: Towards OpenMP support in LLVM. In: 2013 European LLVM Conference (2013)
Google Scholar
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81. Association for Computing Machinery, New York (2008). https://doi.org/10.1145/1454115.1454128
Bronevetsky, G., Gyllenhaal, J., de Supinski, B.R.: CLOMP: accurately characterizing OpenMP application overheads. In: Eigenmann, R., de Supinski, B.R. (eds.) IWOMP 2008. LNCS, vol. 5004, pp. 13–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-79561-2_2
Chapter Google Scholar
Bull, J.M., Enright, J.P., Guo, X., Maynard, C., Reid, F.: Performance evaluation of mixed-mode OpenMP/MPI implementations. Int. J. Parallel Program. 38, 396–417 (2010). https://doi.org/10.1007/s10766-010-0137-2
Article MATH Google Scholar
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC), IISWC 2009, pp. 44–54. IEEE Computer Society, USA (2009). https://doi.org/10.1109/IISWC.2009.5306797
Cook, J., Finkel, H., Junghams, C., McCorquodale, P., Pavel, R., Richards, D.F.: Proxy app prospectus for ECP application development projects. Office of Scientific and Technical Information (OSTI), October 2017. https://doi.org/10.2172/1477829
Doerfert, J., Diaz, J.M.M., Finkel, H.: The TRegion interface and compiler optimizations for OpenMP target regions. In: Fan, X., de Supinski, B.R., Sinnen, O., Giacaman, N. (eds.) IWOMP 2019. LNCS, vol. 11718, pp. 153–167. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-28596-8_11
Chapter Google Scholar
Doerfert, J., Finkel, H.: Compiler optimizations for OpenMP. In: de Supinski, B.R., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds.) IWOMP 2018. LNCS, vol. 11128, pp. 113–127. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98521-3_8
Chapter Google Scholar
Doerfert, J., Finkel, H.: Compiler optimizations for parallel programs. In: Hall, M., Sundar, H. (eds.) LCPC 2018. LNCS, vol. 11882, pp. 112–119. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34627-0_9
Chapter Google Scholar
Dongarra, J., Heroux, M.A., Luszczek, P.: High-performance conjugate-gradient benchmark: a new metric for ranking high-performance computing systems. Int. J. High Perform. Comput. Appl. 30(1), 3–10 (2016). https://doi.org/10.1177/1094342015593158
Article Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP. In: 2009 International Conference on Parallel Processing, pp. 124–131 (2009)
Google Scholar
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exp. 22(6), 702–719 (2010)
Google Scholar
Gunow, G., Tramm, J., Forget, B., Smith, K., He, T.: SimpleMOC - a performance abstraction for 3D MOC (2015)
Google Scholar
Heroux, M.A., et al.: Improving performance via mini-applications. Sandia National Laboratories, Technical Report SAND2009-5574 3 (2009)
Google Scholar
Jin, H., Frumkin, M.A., Yan, J.M.: The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance (1999)
Google Scholar
Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_3
Chapter Google Scholar
Karlin, I., et al.: Exploring traditional and emerging parallel programming models using a proxy application. In: 2013 IEEE 27th International Symposium on Parallel and Distributed Processing, pp. 919–932 (2013)
Google Scholar
Kunen, A.J., Bailey, T.S., Brown, P.N.: KRIPKE - a massively parallel transport mini-app. Office of Scientific and Technical Information (OSTI), June 2015
Google Scholar
Marques, D., et al.: Performance analysis with cache-aware roofline model in intel advisor. In: 2017 International Conference on High Performance Computing Simulation (HPCS), pp. 898–907 (2017)
Google Scholar
Mohd-Yusof, J., Swaminarayan, S., Germann, T.C.: Co-design for molecular dynamics: an exascale proxy application. Technical report LA-UR 13-20839 (2013)
Google Scholar
Müller, M.S., et al.: SPEC OMP2012—An application benchmark suite for parallel systems using OpenMP. In: Chapman, B.M., Massaioli, F., Müller, M.S., Rorro, M. (eds.) IWOMP 2012. LNCS, vol. 7312, pp. 223–236. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30961-8_17
Chapter Google Scholar
Niethammer, C., Gracia, J., Knúpfer, A., Resch, M.M., Nagel, W.E. (eds.): Tools for High Performance Computing 2014. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16012-2
Park, J., Smelyanskiy, M., Yang, U.M., Mudigere, D., Dubey, P.: High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems. In: SC 2015: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2015)
Google Scholar
Richards, D.F., Bleile, R.C., Brantley, P.S., Dawson, S.A., McKinley, M.S., O’Brien, M.J.: Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury. Office of Scientific and Technical Information (OSTI), July 2017
Google Scholar
Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). https://doi.org/10.1177/1094342006064482
Article Google Scholar
Tramm, J.R., Siegel, A.R., Forget, B., Josey, C.: Performance analysis of a reduced data movement algorithm for neutron cross section data in Monte Carlo simulations. In: Markidis, S., Laure, E. (eds.) EASC 2014. LNCS, vol. 8759, pp. 39–56. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15976-8_3
Chapter Google Scholar
Tramm, J.R., Siegel, A.R., Islam, T., Schulz, M.: XSBench - the development and verification of a performance abstraction for Monte Carlo reactor analysis. In: PHYSOR 2014 - The Role of Reactor Physics toward a Sustainable Future. Kyoto (2014). https://www.mcs.anl.gov/papers/P5064-0114.pdf
Trott, C.R., et al.: ASC Trilab L2 Codesign Milestone 2015. Office of Scientific and Technical Information (OSTI), September 2015. https://doi.org/10.2172/1221176

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DEAC52-07NA27344 (LLNL-CONF-810797) and also partially supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration) responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering, and early testbed platforms, in support of the nation’s exascale computing imperative.

Author information

Authors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, 94550, USA
Giorgis Georgakoudis, Ignacio Laguna & Thomas R. W. Scogland
Argonne National Laboratory, Lemont, IL, 60439, USA
Johannes Doerfert

Authors

Giorgis Georgakoudis
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Doerfert
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Laguna
View author publications
You can also search for this author in PubMed Google Scholar
Thomas R. W. Scogland
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giorgis Georgakoudis .

Editor information

Editors and Affiliations

Texas Advanced Computing Center (TACC), Austin, TX, USA
Kent Milfeld
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Texas Advanced Computing Center (TACC), Austin, TX, USA
Lars Koesterke
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Georgakoudis, G., Doerfert, J., Laguna, I., Scogland, T.R.W. (2020). FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis. In: Milfeld, K., de Supinski, B., Koesterke, L., Klinkenberg, J. (eds) OpenMP: Portable Multi-Level Parallelism on Modern Systems. IWOMP 2020. Lecture Notes in Computer Science(), vol 12295. Springer, Cham. https://doi.org/10.1007/978-3-030-58144-2_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58144-2_1
Published: 01 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58143-5
Online ISBN: 978-3-030-58144-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics