Advertisement

Inter-array Data Regrouping

  • Chen Ding
  • Ken Kennedy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1863)

Abstract

As the speed gap between CPU and memory widens, memory hierarchy has become the performance bottleneck for most applications because of both the high latency and low bandwidth of direct memory access. With the recent introduction of latency hiding strategies on modern machines, limited memory bandwidth has become the primary performance constraint and, consequently, the effective use of available memory bandwidth has become critical. Since memory data is transferred one cache block at a time, improving the utilization of cache blocks can directly improve memory bandwidth utilization and program performance. However, existing optimizations do not maximize cache-block utilization because they are intra-array; that is, they improve only data reuse within single arrays, and they do not group useful data of multiple arrays into the same cache block. In this paper, we present inter-array data regrouping, a global data transformation that first splits and then selectively regroups all data arrays in a program. The new transformation is optimal in the sense that it exploits inter-array cache-block reuse when and only when it is always profitable. When evaluated on real-world programs with both regular contiguous data access, and irregular and dynamic data access, inter-array data regrouping transforms as many as 26 arrays in a program and improves the overall performance by as much as 32%.

Keywords

Data Access Memory Bandwidth Computation Phase Direct Memory Access Spatial Reuse 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. Anderson, S. Amarasinghe, and M. Lam. Data and computation transformation for multiprocessors. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Santa Barbara, CA, July 1995.Google Scholar
  2. 2.
    O. Beckmann and P.H.J. Kelly. Efficient interprocedural data placement optimisation in a parallel library. In Proceedings of the Fourth Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, May 1998.Google Scholar
  3. 3.
    B. Calder, K. Chandra, S. John, and T. Austin. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, Oct 1998.Google Scholar
  4. 4.
    T.M. Chilimbi, B. Davidson, and J.R. Larus. Cache-conscious structure definition. In Proceedings of SIGPLAN Conference on Programming Language Design and Implementation, 1999.Google Scholar
  5. 5.
    M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN’ 95 Conference on Programming Language Design and Implementation, La Jolla, June 1995.Google Scholar
  6. 6.
    C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN’ 99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.Google Scholar
  7. 7.
    C. Ding and K. Kennedy. Memory bandwidth bottleneck and its amelioration by a compiler. Technical report, Rice University, May 1999. Submitted for publication.Google Scholar
  8. 8.
    J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag.Google Scholar
  9. 9.
    D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformations. In Proceedings of the First International Conference on Supercomputing. Springer-Verlag, Athens, Greece, June 1987.Google Scholar
  10. 10.
    Tor E. Jeremiassen and Susan J. Eggers. Reducing false sharing on shared memory multiprocessors through compile time data transformations. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 179–188, Santa Barbara, CA, July 1995.Google Scholar
  11. 11.
    M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. A matrix-based approach to the global locality optimization problem. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, 1998.Google Scholar
  12. 12.
    D. G. Kirkpatrick and P. Hell. On the completeness of a generalized matching problem. In The Tenth Annual ACM Symposium on Theory of Computing, 1978.Google Scholar
  13. 13.
    U. Kremer. Automatic Data Layout for Distributed Memory Machines. PhD thesis, Dept. of Computer Science, Rice University, October 1995.Google Scholar
  14. 14.
    S. Leung. Array restructuring for cache locality. Technical Report UW-CSE-96-08-01, University of Washington, 1996. PhD Thesis.Google Scholar
  15. 15.
    M.E. Mace. Memory storage patterns in parallel processing. Kluwer Academic, Boston, 1987.Google Scholar
  16. 16.
    K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424–453, July 1996.Google Scholar
  17. 17.
    K. O. Thabit. Cache Management by the Compiler. PhD thesis, Dept. of Computer Science, Rice University, 1981.Google Scholar
  18. 18.
    M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN’ 91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Chen Ding
    • 1
  • Ken Kennedy
    • 1
  1. 1.Rice UniversityUSA

Personalised recommendations