Array Unification: A Locality Optimization Technique

  • Mahmut Taylan Kandemir
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2027)


One of the key challenges facing computer architects and compiler writers is the increasing discrepancy between processor cycle times and main memory access times. To alleviate this problem for a class of array-dominated codes, compilers may employ either control-centric transformations that change data access patterns of nested loops or data-centric transformations that modify the memory layouts of multi-dimensional arrays. Most of the layout optimizations proposed so far either modify the layout of each array independently or are based on explicit data reorganizations at runtime.

This paper describes a compiler technique, called array unification, that automatically maps multiple arrays into a single data (array) space to improve data locality. We present a mathematical framework that enables us to systematically derive suitable mappings for a given program. The framework divides the arrays accessed by the program into several groups and each group is transformed to improve spatial locality and reduce the number of conflict misses. As compared to the previous approaches, the proposed technique works on a larger scope and makes use of independent layout transformations as well whenever necessary. Preliminary results on two benchmark codes show significant improvements in cache miss rates and execution time.


Data Transformation Nest Loop Cache Line Array Variable Multidimensional Array 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    T. Chilimbi, M. Hill, and J. Larus. Cache-conscious structure layout. In Proc. The SIGPLAN’99 Conf. on Prog. Lang. Design and Impl., Atlanta, GA, May 1999.Google Scholar
  2. 2.
    M. Cierniak and W. Li. Unifying data and control transformations for distributed shared memory machines. In Proc. SIGPLAN’ 95 Conf. on Programming Language Design and Implementation, June 1995.Google Scholar
  3. 3.
    C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at runtime. In Proc. ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, Georgia, May, 1999.Google Scholar
  4. 4.
    C. Ding and K. Kennedy. Inter-array data regrouping. In Proc. the 12th Workshop on Languages and Compilers for Parallel Computing, San Diego, CA, August 1999.Google Scholar
  5. 5.
    C. Eisenbeis, S. Lelait, and B. Marmol. The meeting graph: a new model for loop cyclic register allocation. In Proc. the IFIP WG 10.3 Working Conference on Parallel Architectures and Compilation Techniques, Limassol, Cyprus, June 1995.Google Scholar
  6. 6.
    D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. Journal of Parallel & Distributed Computing, 5(5):587–616, October 1988.CrossRefGoogle Scholar
  7. 7.
    F. Irigoin and R. Triolet. Super-node partitioning. In Proc. 15th Annual ACM Symp. Principles of Prog. Lang., pp. 319–329, San Diego, CA, January 1988.Google Scholar
  8. 8.
    M. Kandemir, A. Choudhary, N. Shenoy, P. Banerjee, and J. Ramanujam. A hyperplane based approach for optimizing spatial locality in loop nests. In Proc. 1998 ACM Intl. Conf. on Supercomputing, Melbourne, Australia, July 1998.Google Scholar
  9. 9.
    M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proc. 11th ACM Intl. Conf. on Supercomputing, pages 269–276, Vienna, Austria, July 1997.Google Scholar
  10. 10.
    I. Kodukula, N. Ahmed, and K. Pingali. Data-centric multi-level blocking. In Proc. SIGPLAN Conf. Programming Language Design and Implementation, June 1997.Google Scholar
  11. 11.
    S.-T. Leung and J. Zahorjan. Optimizing data locality by array restructuring. Technical Report TR 95-09-01, Dept. Computer Science and Engineering, University of Washington, Sept. 1995.Google Scholar
  12. 12.
    W. Li. Compiling for NUMA parallel machines. Ph.D. Thesis, Cornell Uni., 1993.Google Scholar
  13. 13.
    S. Y. Liao. Code Generation and Optimization for Embedded Digital Signal Processors. Ph.D. Thesis, Dept. of EECS, MIT, Cambridge, Massachusetts, June 1996.Google Scholar
  14. 14.
    K. McKinley, S. Carr, and C.W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 1996.Google Scholar
  15. 15.
    J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. In Proc. the ACM Intl. Conf. on Supercomputing, Rhodes, Greece, June 1999.Google Scholar
  16. 16.
    M. O’Boyle and P. Knijnenburg. Integrating loop and data transformations for global optimisation. In Intl. Conf. on Parallel Architectures and Compilation Techniques, October 1998, Paris, France.Google Scholar
  17. 17.
    O. Temam, E. Granston, and W. Jalby. To copy or not to copy: A compile-time technique for assessing when data copying should be used to eliminate cache conflicts. In Proc. the IEEE Supercomputing’93, Portland, November 1993.Google Scholar
  18. 18.
    G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proc. the 1998 ACM SIGPLAN Conf. on Prog. Lang. Design and Implementation, Montreal, Canada, June 1998.Google Scholar
  19. 19.
    M. Wolf and M. Lam. A data locality optimizing algorithm. In Proc. ACM SIGPLAN 91 Conf. Prog. Lang. Design and Implementation, pages 30–44, June 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Mahmut Taylan Kandemir
    • 1
  1. 1.Computer Science and Engineering DepartmentThe Pennsylvania State UniversityUniversity ParkUSA

Personalised recommendations