Efficient analytical modelling of multi-level set-associative caches
The time a program takes to execute is significantly affected by the efficiency with which it utilises cache memory. Moreover the cache miss behaviour of a program is highly unstable, in that small changes to input parameters can cause large changes in the number of misses. In this paper we describe novel analytical methods of predicting the cache miss ratio of numerical programs, for sequential hierarchies of setassociative caches. The methods are demonstrated to be applicable to most loop nests. They are also shown to be highly accurate, yet able to be evaluated orders of magnitude faster than a comparable simulation.
Unable to display preview. Download preview PDF.
- D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63–73, Fall 1991.CrossRefGoogle Scholar
- J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Proceedings of Languages and Compilers for Parallel Computing, volume 589 of LNCS, pages 328–343, Berlin, Germany, Aug. 1992, Springer.CrossRefGoogle Scholar
- S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 11th ACM International Conference on Supercomputing, Vienna, Austria, July 1997.Google Scholar
- J. S. Harper, D. J. Kerbyson, and G. R. Nudd. Analytical modeling of setassociative cache behavior. To appear in IEEE Transactions on Computers.Google Scholar
- D. J. Kerbyson, E. Papaefstathiou, and G. R. Nudd. Application execution steering using on-the-fly performance prediction. In High-Performance Computing and Networking, volume 1401 of LNCS, pages 718–727. Springer, 1998.Google Scholar
- M. S. Lam, E. E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74, Santa Clara, California, 1991.Google Scholar
- G. R. Nudd, D. J. Kerbyson, e. Papaefstathiou, J. S. Harper, S. C. Perry, and D. V. Wilcox. PACE: A toolset for the performance prediction of parallel and distributed systems. Journal of High Performance and Scientific Applications, 1999.Google Scholar
- E. Papaefstathiou, D. J. Kerbyson, G. R. Nudd, D. V. Wilcox, J. S. Harper, and S. C. Perry. A common workload interface for performance prediction of high performance systems. In Workshop on Performance Analysis and its Impact on Design (PAID98), Barcelona, Spain, June 1998.Google Scholar
- O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceeding of ACM SIGMETRICS, pages 261–271, 1994.Google Scholar
- M. E. Wolf and M. S. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Confernce on Programming Language Design and Implementation, volume 26, pages 30–44, June 1991.Google Scholar