Advertisement

Locality Enhancement by Array Contraction

  • Yonghong Song
  • Cheng Wang
  • Zhiyuan Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2624)

Abstract

In this paper, we study how array contraction can enhance locality and improve performance. In our previous work, we have developed a memory minimization scheme, SFC, which is a combination of loop shifting, loop fusion and array contraction. SFC focuses on reducing the memory requirement, and as a by-product, it may enhance cache locality. In this paper, we study how array contraction can contribute to cache locality and performance enhancement. We develop a memory cost model for SFC. We also present a fusion algorithm so that the predicted locality enhancement can be realized. Experimental results on both a real machine and a simulator demonstrate the effectiveness of array contraction on cache locality enhancement and performance improvement.

Keywords

Cache Size Fusion Algorithm Cache Line Locality Enhancement Cache Locality 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Bacon, D., Graham, S., Sharp, O.: Compiler transformations for high-performance computing. ACM Computing Surveys, 26(4):345–420, December 1994.CrossRefGoogle Scholar
  2. [2]
    Burger D., Austin, T.: The simplescalar tool set, version 2.0. Technical Report TR-1342, Department of Computer Sciences, Univ. of Wisconsin, Madison, June 1997.Google Scholar
  3. [3]
    Darte, A.: On the complixity of loop fusion. In Proceedings of International Conference on Parallel Architecture and Compilation Techniques, pages 149–157, Newport Beach, California, October 1999.Google Scholar
  4. [4]
    Gao, G., Olsen, R., Sarkar V., Thekkath, R.: Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing. Also in No. 757 in Necture Notes in Computer Science, pages 281–295, Springer-Verlag, 1992.Google Scholar
  5. [5]
    Gu, J., Li, Z., Lee, G.: Experience with effcient array data flow analysis for array privatization. In Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 157–167, Las Vegas, NV, June 1997.Google Scholar
  6. [6]
    Hennessy, J., Patterson, D.: Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, 1996.Google Scholar
  7. [7]
    Kennedy K., McKinley, K.: Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Springer-Verlag Lecture Notes in Computer Science, 768. Proceedings of the Sixth Workhsop on Languages and Compilers for Parallel Computing, Portland, Oregon, August, 1993.Google Scholar
  8. [8]
    Manjikian, N., Abdelrahman, T.: Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193–209, February 1997.CrossRefGoogle Scholar
  9. [9]
    Mohamed, A., Fox, G., Laszewski, G., Parashar, M., Haupt, T., Mills, K., Lu, Y., Lin, N., Yeh, N.: Applications benchmark set for fortran-d and high performance fortran. Technical Report CRPS-TR92260, Center for Research on Parallel Computation, Rice University, June 1992.Google Scholar
  10. [10]
    Rice, J., Jing, J.: Problems to test parallel and vector languages. Technical Report CSD-TR-1016, Department of Computer Science, Purdue University, 1990.Google Scholar
  11. [11]
    Rivera, G., Tseng, C,: Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, pages 353–360, Melbourne, Australia, July 1998.Google Scholar
  12. [12]
    Sarkar, V.: Optimized unrolling of nested loops. In Proceedings of the ACM International Conference on Supercomputing, pages 153–166, Santa Fe, NM, May 2000.Google Scholar
  13. [13]
    Singhai, S., McKinley, K.: A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6), 1997.Google Scholar
  14. [14]
    Song, Y., Xu, R., Wang, C., Li, Z.: Data locality enhancement by memory reduction. In Proceedings of the 15th ACM International Conference on Supercomputing, Naples, Italy, June 2001.Google Scholar
  15. [15]
    Wolf, M., Lam, M.: A data locality optimizing algorithm. In Proceedings of ACM SIGPLAN Conference on Programming Languages Design and Implementation, pages 30–44, Toronto, Ontario, Canada, June 1991.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Yonghong Song
    • 1
  • Cheng Wang
    • 2
  • Zhiyuan Li
    • 2
  1. 1.Sun Microsystems, IncPalo AltoUSA
  2. 2.Department of Computer SciencesPurdue UniversityWest LafayetteUSA

Personalised recommendations