Counter Inspection Toolkit: Making Sense Out of Hardware Performance Events

Danalis, Anthony; Jagode, Heike; Hanumantharayappa; Ragate, Sangamesh; Dongarra, Jack

doi:10.1007/978-3-030-11987-4_2

Counter Inspection Toolkit: Making Sense Out of Hardware Performance Events

Anthony Danalis⁶,
Heike Jagode⁶,
Hanumantharayappa⁷,
Sangamesh Ragate⁸ &
…
Jack Dongarra⁶

Conference paper
First Online: 15 February 2019

334 Accesses
1 Citations

Abstract

Hardware counters play an essential role in understanding the behavior of performance-critical applications, and inform any effort to identify opportunities for performance optimization. However, because modern hardware is becoming increasingly complex, the number of counters that are offered by the vendors increases and, in some cases, so does their complexity. In this paper we present a toolkit that aims to assist application developers invested in performance analysis by automatically categorizing and disambiguating performance counters. We present and discuss the set of microbenchmarks and analyses that we developed as part of our toolkit. We explain why they work and discuss the non-obvious reasons why some of our early benchmarks and analyses did not work in an effort to share with the rest of the community the wisdom we acquired from negative results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The actual count is not zero, but rather a small number due to noise caused by code not shown in the figures, such as the calls to PAPI_start() and PAPI_stop(). However, in our experiments this number did not grow when varying the variable size, so for large iteration counts the fraction of mispredicted branches approaches zero.
2.
Other, more sophisticated goodness functions, such as Pearson’s \(\chi ^2\) test [7], could be used to assist in the analysis of the measurements, but in our experiments we found that the simple formula in Eq. 1 is sufficient.

References

Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Article Google Scholar
Intel Corporation. Intel^® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2 (2017)
Google Scholar
Pearson, K.: Notes on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 58, 240–242 (1895)
Google Scholar
Danalis, A., Luszczek, P., Marin, G., Vetter, J.S., Dongarra, J.: Blackjackbench: portable hardware characterization with automated results analysis. Comput. J. 57(7), 1002 (2014)
Article Google Scholar
McVoy, L., Staelin, C.: lmbench: Portable tools for performance analysis. In: Proceedings of the Annual Technical Conference on USENIX 1996 Annual Technical Conference ATEC’96, pp. 23–23. USENIX Association, Berkeley, CA, USA, 24–26 Jan 1996
Google Scholar
Mucci, P.J., London, K.: The CacheBench Report. Technical report, Computer Science Department, University of Tennessee, Knoxville, TN (1998)
Google Scholar
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 5(50), 157–175 (1900)
Google Scholar
Molnar, I.: perf: Linux profiling with performance counters (2009). https://perf.wiki.kernel.org/
Wolf III, J.H..: Programming Methods for the Pentium III Processor’s Streaming SIMD Extensions Using the VTune™ Performance Enhancement Environment. Intel Corporation (1999)
Google Scholar
Intel Performance Tuning Utility. http://software.intel.com/en-us/articles/intel-performance-tuning-utility/
Drongowski, P.J.: An introduction to analysis and optimization with AMD Code Analyst™ Performance Analyzer. Advanced Micro Devices, Inc. (2008)
Google Scholar
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures, September 2010
Google Scholar
Dongarra, J., Moore, S., Mucci, P., Seymour, K., You, H.: Accurate cache and TLB characterization using hardware counters. In: Marian Bubak, G., van Albada, D., Sloot, P.M.A., Dongarra, J. (eds.) International Conference on Computational Science, volume 3036 of Lecture Notes in Computer Science, pp. III:432–439. Krakow Poland, June 2004. Springer, Heidelberg. ISBN 3-540-22114-X
Chapter Google Scholar
Duchateau, A.X., Sidelnik, A., Garzarán, M.J., Padua, D.A.: P-ray: a suite of micro-benchmarks for multi-core architectures. In: Proceeding of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC’08)
Google Scholar
Gonzalez-Dominguez, J., Taboada, G.L., Fraguela, B.B., Martin, M.J., Tourio, J.: Servet: a benchmark suite for autotuning on multicore clusters. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–10. IEEE Computer Society, Atlanta, GA, 19–23 Apr 2010. https://doi.org/10.1109/IPDPS.2010.5470358
Molka, D., Hackenberg, D., Schone, R., Muller, M.S.: Memory performance and cache coherency effects on an intel nehalem multiprocessor system. In: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques PACT ’09, pp. 261–270, Raleigh, North Carolina, September 12–16. IEEE Computer Society, DC, USA, Washington (2009)
Google Scholar
Staelin, C., McVoy, L.: mhz: Anatomy of a micro-benchmark. In: USENIX 1998 Annual Technical Conference, pp. 155–166. USENIX Association, New Orleans, Louisiana, 15–18 Jan 1998
Google Scholar
Yotov, K., Jackson, S., Steele, T., Pingali, K., Stodghill, P.: Automatic measurement of instruction cache capacity. In: Proceedings of the 18th Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 230–243. Springer, Hawthorne, New York, 20–22 Oct 2005
Google Scholar
Yotov, K., Pingali, K., Stodghill, P.: Automatic measurement of memory hierarchy parameters. SIGMETRICS Perform. Eval. Rev. 33(1), 181–192 (2005)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1450429.

Author information

Authors and Affiliations

Innovative Computing Laboratory, University of Tennessee, 1122 Volunteer Blvd, Knoxville, TN, 37996, USA
Anthony Danalis, Heike Jagode & Jack Dongarra
Mathworks, 7700 Gleason drive, Apt 23D, Knoxville, TN, 37919, USA
Hanumantharayappa
Cerebras Systems, 428 Madera Ave, apt 5, Sunnyvale, CA, 94086, USA
Sangamesh Ragate

Authors

Anthony Danalis
View author publications
You can also search for this author in PubMed Google Scholar
Heike Jagode
View author publications
You can also search for this author in PubMed Google Scholar
Hanumantharayappa
View author publications
You can also search for this author in PubMed Google Scholar
Sangamesh Ragate
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Danalis .

Editor information

Editors and Affiliations

Höchstleistungsrechenzentrum Stuttgart (HLRS), Universität Stuttgart, Stuttgart, Germany
Christoph Niethammer
Höchstleistungsrechenzentrum Stuttgart (HLRS), Universität Stuttgart, Stuttgart, Germany
Michael M. Resch
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Wolfgang E. Nagel
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Holger Brunst
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH), Technische Universität Dresden, Dresden, Germany
Hartmut Mix

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Danalis, A., Jagode, H., Hanumantharayappa, Ragate, S., Dongarra, J. (2019). Counter Inspection Toolkit: Making Sense Out of Hardware Performance Events. In: Niethammer, C., Resch, M., Nagel, W., Brunst, H., Mix, H. (eds) Tools for High Performance Computing 2017. PTHPC 2017. Springer, Cham. https://doi.org/10.1007/978-3-030-11987-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-11987-4_2
Published: 15 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11986-7
Online ISBN: 978-3-030-11987-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics