Language-Centric Performance Analysis of OpenMP Programs with Aftermath

Drebes, Andi; Bréjon, Jean-Baptiste; Pop, Antoniu; Heydemann, Karine; Cohen, Albert

doi:10.1007/978-3-319-45550-1_17

Andi Drebes¹⁶,
Jean-Baptiste Bréjon¹⁸,
Antoniu Pop¹⁶,
Karine Heydemann¹⁷ &
…
Albert Cohen^18,19

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9903))

Included in the following conference series:

International Workshop on OpenMP

1147 Accesses
6 Citations
7 Altmetric

Abstract

We present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate metrics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop’s iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As reported by the numactl command line tool of libnuma, invoked with the –hardware option.
2.
http://www.vi-hps.org/tools/opari2.html.
3.
http://www.bsc.es/computer-sciences/extrae.
4.
https://software.intel.com/en-us/articles/profiling-openmp-applications- -with-intel-vtune-amplifier-xe.

References

http://vite.gforge.inria.fr. Accessed May 2016
Intel openmp runtime library. https://www.openmprtl.org. Accessed May 2016
LLVM OpenMP support. http://openmp.llvm.org. Accessed May 2016
Omni compiler project. http://www.hpcs.cs.tsukuba.ac.jp/omni-compiler/download/download-benchmarks.html. Accessed May 2016
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V.: The NAS Parallel Benchmarks. Technical report (1994)
Google Scholar
Bell, R., Malony, A.D., Shende, S.S.: ParaProf: a portable, extensible, and scalable tool for parallel performance profile analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003)
Chapter Google Scholar
OpenMP Architecture Review Board: OpenMP Application Program Interface Version 4.5, November 2015
Google Scholar
Drebes, A., Pop, A., Heydemann, K., Cohen, A.: Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems. In: 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2016
Google Scholar
Duran, A., Teruel, X., Ferrer, R., Martorell, X., Ayguade, E.: Barcelona openmp tasks suite: a set of benchmarks targeting the exploitation of task parallelism in openmp. In: Proceedings of the International Conference on Parallel Processing, ICpp 2009, pp. 124–131. IEEE Computer Society, Washington, DC, USA (2009)
Google Scholar
Eichenberger, A., Mellor-Crummey, J., Schulz, M., Copty, N., Cownie, J., Dietrich, R., Liu, X., Loh, E., Lorenz, D.: OpenMP Technical Report 2 on the OMPT Interface. Technical report (2014)
Google Scholar
Huck, K.A., Malony, A.D.: Perfexplorer: a performance data mining framework for large-scale parallel computing. In: Proceedings of the ACM/IEEE Conference on Supercomputing, SC 2005, pp. 41–53. IEEE Computer Society, Washington, DC, USA (2005)
Google Scholar
Itzkowitz, M., Mazurov, O., Copty, N., Lin, Y.: An OpenMP Runtime API for Profiling. http://www.compunity.org/futures/omp-api.html. Accessed May 2016
Jost, G., Mazurov, O., an Mey, D.: Adding new dimensions to performance analysis through user-defined objects. In: Mueller, M.S., Chapman, B.M., Supinski, B.R., Malony, A.D., Voss, M. (eds.) IWOMP 2005 and IWOMP 2006. LNCS, vol. 4315, pp. 255–266. Springer, Heidelberg (2008)
Chapter Google Scholar
Muddukrishna, A., Jonsson, P.A., Podobas, A., Brorsson, M.: Grain graphs: openmp performance analysis made easy. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPopp 2016, pp. 28:1–28:13. ACM, New York (2016)
Google Scholar
Müller, M.S., Knüpfer, A., Jurenz, M., Lieber, M.,Brunst, H., Mix, H., Nagel, W.E.: Developing scalable applicationswith Vampir, VampirServer and VampirTrace. In: Proceedings of ParCo 2007. Advances in Parallel Computing, vol. 15, pp. 637–644. IOS Press (2008)
Google Scholar
Pillet, V., Labarta, J., Cortes, T., Girona, S.: PARAVER: A tool to visualize and analyze parallel code. In: WoTUG-18. Technical report (1995)
Google Scholar
Pop, A., Cohen, A.: OpenStream: expressiveness and data-flow compilation of OpenMP streaming programs. ACM Trans. Archit. Code Optim. 9(4), 53:1–53:25 (2013)
Article Google Scholar
Shende, S.S., Malony, A.D.: The tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Article Google Scholar
The Cairo Graphics Team: Cairo graphics. http://www.cairographics.org. Accessed May 2016
The GTK+ Team: The GTK+ project. http://www.gtk.org. Accessed May 2016

Download references

Acknowledgments

Our work was partly supported by the grants EU FET-HPC ExaNoDe H2020-671578, Eurolab-4-HPC H2020-671610, UK EPSRC EP/M004880/1, and France Nano 2017 DEMA. A. Pop is funded by a Royal Academy of Engineering Uni-versity Research Fellowship.

Author information

Authors and Affiliations

School of Computer Science, The University of Manchester, Manchester, UK
Andi Drebes & Antoniu Pop
Sorbonne Universités, UPMC Paris 06, CNRS, UMR 7606, LIP6, Paris, France
Karine Heydemann
Inria, Paris, France
Jean-Baptiste Bréjon & Albert Cohen
École Normale Supérieure, Paris, France
Albert Cohen

Authors

Andi Drebes
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Baptiste Bréjon
View author publications
You can also search for this author in PubMed Google Scholar
Antoniu Pop
View author publications
You can also search for this author in PubMed Google Scholar
Karine Heydemann
View author publications
You can also search for this author in PubMed Google Scholar
Albert Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andi Drebes .

Editor information

Editors and Affiliations

RIKEN AICS , Kobe, Japan
Naoya Maruyama
Lawrence Livermore National Laboratory , Livermore, California, USA
Bronis R. de Supinski
RIKEN AICS , Kobe, Japan
Mohamed Wahib

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drebes, A., Bréjon, JB., Pop, A., Heydemann, K., Cohen, A. (2016). Language-Centric Performance Analysis of OpenMP Programs with Aftermath. In: Maruyama, N., de Supinski, B., Wahib, M. (eds) OpenMP: Memory, Devices, and Tasks. IWOMP 2016. Lecture Notes in Computer Science(), vol 9903. Springer, Cham. https://doi.org/10.1007/978-3-319-45550-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-45550-1_17
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45549-5
Online ISBN: 978-3-319-45550-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics