Abstract
Existing loop fusion algorithms fuse loop nests only when the dependences in the loop nests are not violated. This paper presents a new algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. We eliminate all these violated dependences by automatic array copying. In this work, such an aggressive loop fusion strategy is applied to a Jacobi program. The performance of such iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi program into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide, 3rd edn. SIAM, Philadelphia (1999)
The OpenMP Architecture Review Boards (ARB), http://www.openmp.org
Darte, A.: On the complexity of loop fusion. Parallel Computing 29(6), 1175–1193 (2000)
Feautrier, P.: Parametric integer programming. Operations Research 22, 243–268 (1988)
Feautrier, P.: Dataflow analysis for array and scalar references. Int. J. of Parallel Programming 20(1), 23–53 (1991)
Manjikian, N., Abdelrahman, T.S.: Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems 8(2), 193–209 (1997)
Pugh, W.: The Omega test: A fast and practical integer programming algorithm for dependence analysis. Comm. ACM 35(8), 102–114 (1992)
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. In: ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI 1999), May 1999, pp. 215–228 (1999)
Wolfe, M.J.: More iteration space tiling. In: Supercomputing 1988, November 1989, pp. 655–664 (1989)
Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)
Xue, J.: On tiling as a loop transformation. Parallel Processing Letters 7(4), 409–424 (1997)
Xue, J.: Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston (2000)
Xue, J., Huang, Q., Guo, M.: Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences. In: International Conference on Parallel Processing (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xue, J. (2005). Aggressive Loop Fusion for Improving Locality and Parallelism. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds) Parallel and Distributed Processing and Applications. ISPA 2005. Lecture Notes in Computer Science, vol 3758. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576235_28
Download citation
DOI: https://doi.org/10.1007/11576235_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29769-7
Online ISBN: 978-3-540-32100-2
eBook Packages: Computer ScienceComputer Science (R0)