Abstract
The OpenMP (The OpenMP name is a registered trademark of the OpenMP Architecture Review Board.) application programming interface provides a simple way for programmers to write parallel programs that are portable between machines and vendors. Programmers parallelize their programs to obtain higher performance, but, as the number of cores per processor increases, taking advantage of parallelism efficiently becomes more difficult. To facilitate efficient parallelization and avoid poor utilization of machine resources, programmers need to know where an application is spending time and what factors hinder scalability.
In this paper, we present a Tool for Runtime Instrumentation of OpenMP programs (TRIO) that automatically collects statistics about an application’s use of the OpenMP runtime. TRIO provides statistics such as the total number of times an OpenMP construct is called, the time spent in each OpenMP construct, and the total time spent within the OpenMP runtime. TRIO helps to identify the runtime calls where a program spends most of the time and which constructs are called the most at runtime.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
These are application teams and crews and do not refer to OpenMP constructs.
References
Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Intel Vtune Amplifier. https://software.intel.com/en-us/intel-vtune-amplifier-xe
CORAL Benchmarks. https://asc.llnl.gov/CORAL-benchmarks/
Barthou, D., Charif Rubial, A., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Müller, M., Resch, M., Schulz, A., Nagel, W. (eds.) Tools for High Performance Computing, pp. 95–113. Springer, Heidelberg (2010)
Fürlinger, K., Gerndt, M.: ompP: a profiling tool for OpenMP. In: Mueller, M.S., Chapman, B.M., Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005. LNCS, vol. 4315, pp. 15–23. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68555-5_2
Mohr, B., Wolf, F.: KOJAK – a tool set for automatic performance analysis of parallel programs. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 1301–1304. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45209-6_177
Geimer, M., Wolf, F., Wylie, B., Abraham, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exper. 22(6), 702–719 (2010)
Knupfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Muller, M., Nagel, W.: The Vampir Performance analysis tool set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Berlin, Heidelberg (2008)
Mohr, B., Malony, A., Shende, S., Wolf, F.: Design and prototype of a performance tool interface for OpenMP. J. Supercomput. 23(1), 105–128 (2002)
Itzkowitz, M., Mazurov, O., Copay, N., Lin, Y.: An OpenMP runtime API for profiling, OpenMP official ARB White Paper 314, pp. 181–190 (2007)
HPC Toolkit. http://hpctoolkit.org/manual/HPCToolkit-users-manual.pdf
Eichenberger, A.E., et al.: OMPT: an OpenMP tools application programming interface for performance analysis. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 171–185. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40698-0_13
Bull, J.M., O’Neill, D.: A microbenchmark suite for OpenMP 2.0. SIGARCH Comput. Archit. News 29(5), 41–48 (2001)
LLVM OpenMP. openmp.llvm.org
Acknowledgement
This material is based upon work supported by Subcontract No. B609815 with Argonne National Laboratory and Intel Federal LLC. We thank professor John Mellor-Crummey for his feedback on OMPT and its comparison with TRIO.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
The TRIO output included here in Fig. 4 is from the CLOMP run mentioned in Sect. 3.2. In the interest of space, we have included only the non-zero fields. The scripts use the “Total” column to process the results. Even though the fork and join barrier times are measured separately, we sum them up for plots. The raw output provides a clearer relationship between OMP_idle and OMP_serial, i.e. Total_OMP_idle can be computed using Total_OMP_serial as, \(((num\_threads) - 1 ) \times Total\_OMP\_serial\).
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Doodi, T. et al. (2017). OpenMP\(^{\textregistered }\) Runtime Instrumentation for Optimization. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., MĂĽller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)