Abstract
One of the key challenges facing computer architects and compiler writers is the increasing discrepancy between processor cycle times and main memory access times. To alleviate this problem for a class of array-dominated codes, compilers may employ either control-centric transformations that change data access patterns of nested loops or data-centric transformations that modify the memory layouts of multi-dimensional arrays. Most of the layout optimizations proposed so far either modify the layout of each array independently or are based on explicit data reorganizations at runtime.
This paper describes a compiler technique, called array unification, that automatically maps multiple arrays into a single data (array) space to improve data locality. We present a mathematical framework that enables us to systematically derive suitable mappings for a given program. The framework divides the arrays accessed by the program into several groups and each group is transformed to improve spatial locality and reduce the number of conflict misses. As compared to the previous approaches, the proposed technique works on a larger scope and makes use of independent layout transformations as well whenever necessary. Preliminary results on two benchmark codes show significant improvements in cache miss rates and execution time.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
T. Chilimbi, M. Hill, and J. Larus. Cache-conscious structure layout. In Proc. The SIGPLAN’99 Conf. on Prog. Lang. Design and Impl., Atlanta, GA, May 1999.
M. Cierniak and W. Li. Unifying data and control transformations for distributed shared memory machines. In Proc. SIGPLAN’ 95 Conf. on Programming Language Design and Implementation, June 1995.
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at runtime. In Proc. ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, Georgia, May, 1999.
C. Ding and K. Kennedy. Inter-array data regrouping. In Proc. the 12th Workshop on Languages and Compilers for Parallel Computing, San Diego, CA, August 1999.
C. Eisenbeis, S. Lelait, and B. Marmol. The meeting graph: a new model for loop cyclic register allocation. In Proc. the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995.
D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel & Distributed Computing, 5(5):587–616, October 1988.
F. Irigoin and R. Triolet. Super-node partitioning. In Proc. 15th Annual ACM Symp. Principles of Prog. Lang., pp. 319–329, San Diego, CA, January 1988.
M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A hyperplane based approach for optimizing spatial locality in loop nests. In Proc. 1998 ACM Intl. Conf. on Supercomputing, Melbourne, Australia, July 1998.
M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proc. 11th ACM Intl. Conf. on Supercomputing, pages 269–276, Vienna, Austria, July 1997.
I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1997.
S.-T. Leung and J. Zahorjan. Optimizing data locality by array restructuring. Technical Report TR 95-09-01, Dept. Computer Science and Engineering, University of Washington, Sept. 1995.
W. Li. Compiling for NUMA parallel machines. Ph.D. Thesis, Cornell Uni., 1993.
S. Y. Liao. Code Generation and Optimization for Embedded Digital Signal Processors. Ph.D. Thesis, Dept. of EECS, MIT, Cambridge, Massachusetts, June 1996.
K. McKinley, S. Carr, and C.W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 1996.
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proc. the ACM Intl. Conf. on Supercomputing, Rhodes, Greece, June 1999.
M. O’Boyle and P. Knijnenburg. Integrating loop and data transformations for global optimisation. In Intl. Conf. on Parallel Architectures and Compilation Techniques, October 1998, Paris, France.
O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proc. the IEEE Supercomputing’93, Portland, November 1993.
G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proc. the 1998 ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, Montreal, Canada, June 1998.
M. Wolf and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conf. Prog. Lang. Design and Implementation, pages 30–44, June 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kandemir, M.T. (2001). Array Unification: A Locality Optimization Technique. In: Wilhelm, R. (eds) Compiler Construction. CC 2001. Lecture Notes in Computer Science, vol 2027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45306-7_18
Download citation
DOI: https://doi.org/10.1007/3-540-45306-7_18
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41861-0
Online ISBN: 978-3-540-45306-2
eBook Packages: Springer Book Archive