Abstract
Hardware performance monitoring (HPM) is a crucial ingredient of performance analysis tools. While there are interfaces like LIKWID, PAPI or the kernel interface perf_event which provide HPM access with some additional features, many higher level tools combine event counts with results retrieved from other sources like function call traces to derive (semi-)automatic performance advice. However, although HPM is available for x86 systems since the early 90s, only a small subset of the HPM features is used in practice. Performance patterns provide a more comprehensive approach, enabling the identification of various performance-limiting effects. Patterns address issues like bandwidth saturation, load imbalance, non-local data access in ccNUMA systems, or false sharing of cache lines. This work defines HPM event sets that are best suited to identify a selection of performance patterns on the Intel Haswell processor. We validate the chosen event sets for accuracy in order to arrive at a reliable pattern detection mechanism and point out shortcomings that cannot be easily circumvented due to bugs or limitations in the hardware.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput.: Pract. Exp.D 22(6), 685–701 (2010)
Eranian, S.: Perfmon2: a flexible performance monitoring interface for Linux. In: Ottawa Linux Symposium, pp. 269–288, Citeseer (2006)
Geimer, M., Wolf, F., Wylie, B.J., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput.: Pract. Exp. 22(6), 702–719 (2010)
Gleixner, T., Molnar, I.: Linux 2.6.32: perf_event.h. http://lwn.net/Articles/310260/ (2008)
Guillen, C.: Knowledge-based performance monitoring for large scale HPC architectures. Dissertation p. http://mediatum.ub.tum.de/?id=1237547 (2015)
Intel: Intel 64 and IA-32 Architectures Software Developer Manuals. http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html (2015)
Intel: Intel Open Source Technology Center for PerfMon. https://download.01.org/perfmon/ (2015)
Intel: Intel Xeon Processor E3-1200 v3 Product Family Specification Update. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e3-1200v3-spec-update.pdf (2015)
Intel: Intel Xeon Processor E5 v3 Family Uncore Performance Monitoring. https://www-ssl.intel.com/content/dam/www/public/us/en/zip/xeon-e5-v3-uncore-performance-monitoring.zip (2015)
Kufrin, R.: Perfsuite: An accessible, open source performance analysis environment for linux. In: 6th International Conference on Linux Clusters: The HPC Revolution, vol. 151, p. 05. Citeseer (2005)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40(6), 190–200 (2005). http://doi.acm.org/10.1145/1064978.1065034
Mucci, P.J., Browne, S., Deane, C., Ho, G.: PAPI: A portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference. pp. 7–10 (1999)
Pettersson, M.: Linux x86 performance-monitoring counters driver (2003)
Roehl, T.: Performance patterns for the Intel Haswell EP/EN/EX architecture. https://github.com/RRZE-HPC/likwid/wiki/PatternsHaswellEP (2015)
Ryan, B.: Inside the Pentium. BYTE Mag. 18(6), 102–104 (1993)
Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W., Montoya, D., Cranford, S.: Open| SpeedShop: An open source infrastructure for parallel performance analysis. Sci. Prog. 16(2–3), 105–121 (2008)
Treibig, J., Hager, G., Wellein, G.: LIKWID: A lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures. San Diego, CA (2010)
Treibig, J., Hager, G., Wellein, G.: Pattern driven node level performance engineering. http://sc13.supercomputing.org/sites/default/files/PostersArchive/tech_posters/post254s2-file2.pdf (2013), sC13 poster
Treibig, J., Hager, G., Wellein, G.: Performance patterns and hardware metrics on modern multicore processors: Best practices for performance engineering. Euro-Par 2012: Parallel Processing Workshops. Lecture Notes in Computer Science, vol. 7640, pp. 451–460. Springer, Berlin (2013)
Weaver, V., Terpstra, D., Moore, S.: Non-determinism and overcount on modern hardware performance counter implementations. In: 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 215–224 (2013)
Zaparanuks, D., Jovic, M., Hauswirth, M.: Accuracy of performance counter measurements. In: IEEE International Symposium on Performance Analysis of Systems and Software, 2009. ISPASS 2009. pp. 23–32 (2009)
Acknowledgments
Parts of this work were funded by the German Federal Ministry of Research and Education (BMBF) under Grant Number 01IH13009.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Röhl, T., Eitzinger, J., Hager, G., Wellein, G. (2016). Validation of Hardware Events for Successful Performance Pattern Identification in High Performance Computing. In: Knüpfer, A., Hilbrich, T., Niethammer, C., Gracia, J., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2015. Springer, Cham. https://doi.org/10.1007/978-3-319-39589-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-39589-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39588-3
Online ISBN: 978-3-319-39589-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)