A Compiler Framework for Tiling Imperfectly-Nested Loops
This paper presents an integrated compiler framework for tiling a class of nontrivial imperfectly-nested loops such that cache locality is improved. We develop a new memory cost model to analyze data reuse in terms of both the cache and the TLB, based on which we compute the tile size with or without array duplication. We determine whether to duplicate arrays for tiling by comparing the respective exploited reuse factors. The preliminary results with several benchmark programs show that the transformed programs achieve a speedup of 1.09 to 3.82 over the original programs.
KeywordsCache Size Tile Size Program Language Design Innermost Loop Tiling Scheme
Unable to display preview. Download preview PDF.
- 1.D. Bacon, J.-H. Chow, D. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and tlb effectiveness. In Proceedings of CASCON’94, Toronto, Ontario, October 1994.Google Scholar
- 2.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In Proceedings of 4th International Workshop on Languages and Compilers for Parallel Computing, August 1991. Also in Lecture Notes in Computer Science, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., pp. 328–341, Springer-Verlag, Aug. 1991.Google Scholar
- 3.Somnath Ghosh, Margaret Martonosi, and Sharad Malik. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 228–239, San Jose, California, October 1998.Google Scholar
- 4.Junjie Gu, Zhiyuan Li, and Gyungho Lee. Experience with efficient array data flow analysis for array privatization. In Proceedings of the 6th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 157–167, Las Vegas, NV, June 1997.Google Scholar
- 5.Induprakas Kodukula, Nawaaz Ahmed, and Keshav Pingali. Data-centric multi-level blocking. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 346–357, Las Vegas, NV, June 1997.Google Scholar
- 6.Induprakas Kodukula and Keshav Pingali. Transformations of imperfectly nested loops. In Proceedings of Supercomputing, November 1996.Google Scholar
- 7.Naraig Manjikian and Tarek Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193–209, February 1997.Google Scholar
- 8.John McCalpin and David Wonnacott. Time Skewing: A Value-Based Approach to Optimizing for Memory Locality. http://www.haverford.edu/cmsc/davew/cache-opt/cache-opt.html.
- 9.Nicholas Mitchell, Karin Högstedt, Larry Carter, and Jeanne Ferrante. Quantifying the multi-level nature of tiling interactions. International Journal of Parallel Programming, 26(6):641–670, December 1998.Google Scholar
- 10.Gabriel Rivera and Chau-Wen Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, pages 353–360, Melbourne, Australia, July 1998.Google Scholar
- 11.Yonghong Song and Zhiyuan Li. New tiling techniques to improve cache temporal locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 215–228, Atlanta, GA, May 1999.Google Scholar
- 12.Standard Performance Evaluation Corporation, Vols. 1–9. SPEC Newsletter, 1989–1997.Google Scholar
- 13.O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 261–271, Nashville, TN, May 1994.Google Scholar
- 14.Michael E. Wolf and Monica S. Lam. A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada, June 1991.Google Scholar
- 15.Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, pages 274–286, Paris, France, December 1996.Google Scholar
- 16.Michael Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1995.Google Scholar