Abstract
Hardware counters play an essential role in understanding the behavior of performance-critical applications, and inform any effort to identify opportunities for performance optimization. However, because modern hardware is becoming increasingly complex, the number of counters that are offered by the vendors increases and, in some cases, so does their complexity. In this paper we present a toolkit that aims to assist application developers invested in performance analysis by automatically categorizing and disambiguating performance counters. We present and discuss the set of microbenchmarks and analyses that we developed as part of our toolkit. We explain why they work and discuss the non-obvious reasons why some of our early benchmarks and analyses did not work in an effort to share with the rest of the community the wisdom we acquired from negative results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The actual count is not zero, but rather a small number due to noise caused by code not shown in the figures, such as the calls to PAPI_start() and PAPI_stop(). However, in our experiments this number did not grow when varying the variable size, so for large iteration counts the fraction of mispredicted branches approaches zero.
- 2.
References
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2 (2017)
Pearson, K.: Notes on regression and inheritance in the case of two parents. Proc. R. Soc. Lond. 58, 240–242 (1895)
Danalis, A., Luszczek, P., Marin, G., Vetter, J.S., Dongarra, J.: Blackjackbench: portable hardware characterization with automated results analysis. Comput. J. 57(7), 1002 (2014)
McVoy, L., Staelin, C.: lmbench: Portable tools for performance analysis. In: Proceedings of the Annual Technical Conference on USENIX 1996 Annual Technical Conference ATEC’96, pp. 23–23. USENIX Association, Berkeley, CA, USA, 24–26 Jan 1996
Mucci, P.J., London, K.: The CacheBench Report. Technical report, Computer Science Department, University of Tennessee, Knoxville, TN (1998)
Pearson, K.: On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos. Mag. 5(50), 157–175 (1900)
Molnar, I.: perf: Linux profiling with performance counters (2009). https://perf.wiki.kernel.org/
Wolf III, J.H..: Programming Methods for the Pentium III Processor’s Streaming SIMD Extensions Using the VTune™ Performance Enhancement Environment. Intel Corporation (1999)
Intel Performance Tuning Utility. http://software.intel.com/en-us/articles/intel-performance-tuning-utility/
Drongowski, P.J.: An introduction to analysis and optimization with AMD Code Analystâ„¢ Performance Analyzer. Advanced Micro Devices, Inc. (2008)
Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures, September 2010
Dongarra, J., Moore, S., Mucci, P., Seymour, K., You, H.: Accurate cache and TLB characterization using hardware counters. In: Marian Bubak, G., van Albada, D., Sloot, P.M.A., Dongarra, J. (eds.) International Conference on Computational Science, volume 3036 of Lecture Notes in Computer Science, pp. III:432–439. Krakow Poland, June 2004. Springer, Heidelberg. ISBN 3-540-22114-X
Duchateau, A.X., Sidelnik, A., Garzarán, M.J., Padua, D.A.: P-ray: a suite of micro-benchmarks for multi-core architectures. In: Proceeding of the 21st International Workshop on Languages and Compilers for Parallel Computing (LCPC’08)
Gonzalez-Dominguez, J., Taboada, G.L., Fraguela, B.B., Martin, M.J., Tourio, J.: Servet: a benchmark suite for autotuning on multicore clusters. In: IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1–10. IEEE Computer Society, Atlanta, GA, 19–23 Apr 2010. https://doi.org/10.1109/IPDPS.2010.5470358
Molka, D., Hackenberg, D., Schone, R., Muller, M.S.: Memory performance and cache coherency effects on an intel nehalem multiprocessor system. In: Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques PACT ’09, pp. 261–270, Raleigh, North Carolina, September 12–16. IEEE Computer Society, DC, USA, Washington (2009)
Staelin, C., McVoy, L.: mhz: Anatomy of a micro-benchmark. In: USENIX 1998 Annual Technical Conference, pp. 155–166. USENIX Association, New Orleans, Louisiana, 15–18 Jan 1998
Yotov, K., Jackson, S., Steele, T., Pingali, K., Stodghill, P.: Automatic measurement of instruction cache capacity. In: Proceedings of the 18th Workshop on Languages and Compilers for Parallel Computing (LCPC), pp. 230–243. Springer, Hawthorne, New York, 20–22 Oct 2005
Yotov, K., Pingali, K., Stodghill, P.: Automatic measurement of memory hierarchy parameters. SIGMETRICS Perform. Eval. Rev. 33(1), 181–192 (2005)
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 1450429.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Danalis, A., Jagode, H., Hanumantharayappa, Ragate, S., Dongarra, J. (2019). Counter Inspection Toolkit: Making Sense Out of Hardware Performance Events. In: Niethammer, C., Resch, M., Nagel, W., Brunst, H., Mix, H. (eds) Tools for High Performance Computing 2017. PTHPC 2017. Springer, Cham. https://doi.org/10.1007/978-3-030-11987-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-11987-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11986-7
Online ISBN: 978-3-030-11987-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)