OpenMP Application Tuning Using Hardware Performance Counters
Hardware counter events on some popular architectures were investigated with the purpose of detecting bottle-necks of particular interest to shared memory programming, such as OpenMP. A fully portable test suite was written in OpenMP, accessing the hardware performance counters be means of PAPI. Relevant events for the intended purpose were shown to exist on the investigated platforms. Further, these events could in most cases be accessed directly through their platform independent, PAPI pre-defined, names. In some cases suggestions for improvement in the pre-defined mapping were made based on the experiments.
KeywordsCache Line Local Cache Cache Coherency Memory Access Pattern Cache Coherency Protocol
Unable to display preview. Download preview PDF.
- 1.Andersson, S., Bell, R., Hague, J., Holthoff, H., Mayes, P., Nakano, J., Shieh, D., Tuccillo, J.: POWER3 Introduction and Tuning Guide. IBM RedBook, http://www.redbooks.ibm.com (1998)
- 3.Papermaster, M., Dinkjian, R., Mayfield, M., Lenk, P., Ciarfella, B., Connell, F.O., DuPont, R.: POWER3: Next Generation 64-bit PowerPC Processor Design. (1998)Google Scholar
- 4.Intel® Architecture Optimization Reference Manual. Intel Corporation. (1999)Google Scholar
- 5.Intel Architecture Software Developer’s Manual Volume 3: System Programming. Intel Corporation. (1999)Google Scholar
- 6.MIPS R10000 Microprocessor User’s Manual. MIPS Technologies (1996)Google Scholar