Skip to main content

Applying Data Copy to Improve Memory Performance of General Array Computations

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4339))

Abstract

Data copy is an important compiler optimization which dynamically rearranges the layout of arrays by copying their elements into local buffers. Traditionally, array copy is considered expensive and has been applied only to the working sets of fully blocked computations. This paper presents an algorithm which automatically applies data copy to optimize the performance of general computations independent of blocking. The algorithm automatically decides where to insert copy operations and which regions of arrays to copy. In addition, when specialized, it is equivalent to a general scalar replacement algorithm on arbitrary array computations. The algorithm is fully implemented and has been applied to optimize several scientific kernels. The results show that the algorithm is highly effective and that data copy can significantly improve the performance of scientific computations, both when combined with blocking and when applied alone without blocking.

The work was developed when the author was under employment by Lawrence Livermore National Laboratory, Livermore, CA, 94550.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, R., Kennedy, K.: Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  2. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Croz, J.D., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK Users’ Guide. The Society for Industrial and Applied Mathematics (1999)

    Google Scholar 

  3. Anderson, J., Amarasinghe, S., Lam, M.: Data and computation transformation for multiprocessors. In: ACM Symposium on Principles and Practices of Parallel Programming, Santa Barbara (July 1995)

    Google Scholar 

  4. Banerjee, U.: Dependence Analysis for Supercomputing. Kluwer Academic Publishers, Boston (1988)

    Google Scholar 

  5. Carr, S., Kennedy, K.: Scalar replacement in the presence of conditional control flow. Software – Practice and Experience 24(1), 51–77 (1994)

    Article  Google Scholar 

  6. Ding, C., Kennedy, K.: Improving cache performance in dynamic applications through data and computation reorganization at run time. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Gorgia (May 1999)

    Google Scholar 

  7. Han, H., Tseng, C.-W.: Improving locality for adaptive irregular scientific codes. Technical Report CS-TR-4039, Dept. of Computer Science, University of Maryland (September 1999)

    Google Scholar 

  8. Kennedy, K., McKinley, K.S.: Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University (also available as CRPC-TR94370) (August 1993)

    Google Scholar 

  9. Lam, M., Rothberg, E., Wolf, M.E.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara (April 1991)

    Google Scholar 

  10. Mellor-Crummy, J., Whalley, D., Kennedy, K.: Improving Memory Hierarchy Performance For Irregular Applications. In: Proceedings of the 13th ACMSIGARCH International Conference on Supercomputing, Phodes, Greece (1999)

    Google Scholar 

  11. O’Boyle, M., Knijnenburg, P.: Integrating loop and data transformations for global optimisation. In: International Conference on Parallel Architectures and Compilation Techniques, Paris, France (October 1998)

    Google Scholar 

  12. Rivera, G., Tseng, C.-W.: Data transformations for eliminating conflict misses. In: ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada (June 1998)

    Google Scholar 

  13. Temam, O., Granston, E., Jalby, W.: To copy or not to copy: A compiletime technique for assessing when data copying should be used to eliminate cache conflicts. In: Proceedings of Supercomputing 1993, Portland, OR (November 1993)

    Google Scholar 

  14. Wolfe, M.J.: Optimizing Supercompilers for Supercomputers. The MIT Press, Cambridge (1989)

    MATH  Google Scholar 

  15. Yi, Q., Kennedy, K., Adve, V.: Transforming complex loop nests for locality. The Journal of Supercomputing 27, 219–264 (2004)

    Article  MATH  Google Scholar 

  16. Yi, Q., Kennedy, K., You, H., Seymour, K., Dongarra, J.: Automatic blocking of qr and lu factorizations for locality. In: The Second ACM SIGPLAN Workshop on Memory System Performance, Washington, DC, USA (June 2004)

    Google Scholar 

  17. Yi, Q., Quinlan, D.: Applying loop optimizations to object-oriented abstractions through general classification of array semantics. In: The 17th International Workshop on Languages and Compilers for Parallel Computing, West Lafayette, Indiana, USA (September 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yi, Q. (2006). Applying Data Copy to Improve Memory Performance of General Array Computations. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds) Languages and Compilers for Parallel Computing. LCPC 2005. Lecture Notes in Computer Science, vol 4339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69330-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69330-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69329-1

  • Online ISBN: 978-3-540-69330-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics