Skip to main content

Automatic Measurement of Instruction Cache Capacity

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Abstract

There is growing interest in autonomic computing systems that can optimize their own behavior on different platforms without manual intervention. Examples of successful self-optimizing systems are ATLAS, which generates Basic Linear Algebra Subroutine (BLAS) Libraries, and FFTW, which generates FFT libraries.

Self-optimizing systems may need the values of hardware parameters such as the number of registers of various types and the capacities of caches at various levels. For example, ATLAS uses the capacity of the L1 cache and the number of registers in determining the size of cache tiles and register tiles.

We have built a system called X-Ray, which uses micro-benchmarks to measure such parameter values automatically. The micro-benchmarks currently implemented in X-Ray can determine the latency of various instructions, the existence of important instructions like fused multiply-add, the number of registers of various kinds, and parameters of the memory hierarchy.

In this paper, we discuss how X-Ray determines the capacity of the instruction cache (I-cache), which is needed for important optimizations such as loop unrolling. We present the micro-benchmark used in X-Ray to measure I-cache capacity, the experimental methodology used to obtain accurate estimates, and experimental results on a large number of current platforms.

This work was supported by an IBM Faculty Partnership Award, DARPA grant NBCH30390004, and by NSF grants ACI-0085969, ACI-0090217, ACI-0103723, ACI-0121401, and ACI-0406345.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann Publishers, San Francisco (2002)

    Google Scholar 

  2. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2) (2005); Special issue on Program Generation, Optimization, and Adaptation

    Google Scholar 

  3. Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, San Francisco (1990)

    Google Scholar 

  4. McVoy, L., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX 1996 Annual Technical Conference, San Diego, CA, January 22–26, pp. 279–294 (1996) Berkeley, CA, USA (January 1996)

    Google Scholar 

  5. Püschel, M., Moura, J.M.F., Johnson, J., Padua, D., Veloso, M., Singer, B.W., Xiong, J., Franchetti, F., Gačić, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: SPIRAL: Code generation for DSP transforms. Proceedings of the IEEE 93(2) (2005); Special issue on Program Generation, Optimization, and Adaptation

    Google Scholar 

  6. Saavedra, R.H., Smith, A.J.: Measuring cache and TLB performance and their effect of benchmark run. Technical Report CSD-93-767 (February 1993)

    Google Scholar 

  7. Staelin, C., McVoy, L.: MHZ: Anatomy of a micro-benchmark. In: USENIX 1998 Annual Technical Conference, New Orleans, Louisiana, January 15–18, pp. 155–166 (1998) Berkeley, CA, USA (June 1998)

    Google Scholar 

  8. Clint Whaley, R., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001); also available as University of Tennessee LAPACK Working Note #147, UT-CS-00-448 (2000), www.netlib.org/lapack/lawns/lawn147.ps

    Article  MATH  Google Scholar 

  9. Yotov, K., Li, X., Ren, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P.: Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE 93(2) (2005); Special issue on Program Generation, Optimization, and Adaptation

    Google Scholar 

  10. Yotov, K., Pingali, K., Stodghill, P.: Automatic measurement of memory hierarchy parameters. In: SIGMETRICS 2005 (June 2005)

    Google Scholar 

  11. Yotov, K., Pingali, K., Stodghill, P.: X-ray: A tool for automatic measurement of hardware parameters. In: QEST 2005 (September 2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yotov, K., Jackson, S., Steele, T., Pingali, K., Stodghill, P. (2006). Automatic Measurement of Instruction Cache Capacity. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69330-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69329-1

  • Online ISBN: 978-3-540-69330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics