Aggressive Loop Fusion for Improving Locality and Parallelism

Xue, Jingling

doi:10.1007/11576235_28

Jingling Xue²¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3758))

Included in the following conference series:

International Symposium on Parallel and Distributed Processing and Applications

768 Accesses
2 Citations

Abstract

Existing loop fusion algorithms fuse loop nests only when the dependences in the loop nests are not violated. This paper presents a new algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. We eliminate all these violated dependences by automatic array copying. In this work, such an aggressive loop fusion strategy is applied to a Jacobi program. The performance of such iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi program into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide, 3rd edn. SIAM, Philadelphia (1999)
Book MATH Google Scholar
The OpenMP Architecture Review Boards (ARB), http://www.openmp.org
Darte, A.: On the complexity of loop fusion. Parallel Computing 29(6), 1175–1193 (2000)
Article MathSciNet Google Scholar
Feautrier, P.: Parametric integer programming. Operations Research 22, 243–268 (1988)
MATH MathSciNet Google Scholar
Feautrier, P.: Dataflow analysis for array and scalar references. Int. J. of Parallel Programming 20(1), 23–53 (1991)
Article MATH Google Scholar
Manjikian, N., Abdelrahman, T.S.: Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems 8(2), 193–209 (1997)
Article Google Scholar
Pugh, W.: The Omega test: A fast and practical integer programming algorithm for dependence analysis. Comm. ACM 35(8), 102–114 (1992)
Article Google Scholar
Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. In: ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI 1999), May 1999, pp. 215–228 (1999)
Google Scholar
Wolfe, M.J.: More iteration space tiling. In: Supercomputing 1988, November 1989, pp. 655–664 (1989)
Google Scholar
Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)
MATH Google Scholar
Xue, J.: On tiling as a loop transformation. Parallel Processing Letters 7(4), 409–424 (1997)
Article MathSciNet Google Scholar
Xue, J.: Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston (2000)
MATH Google Scholar
Xue, J., Huang, Q., Guo, M.: Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences. In: International Conference on Parallel Processing (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Programming Languages and Compilers Group, School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia
Jingling Xue

Authors

Jingling Xue
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of CS, Georgia State University, 30302, Atlanta, GA, USA
Yi Pan
State Key Laboratory for Novel Software Technology, Nanjing University, 210093, Nanjing, Jiangsu, China
Daoxu Chen
Department of Computer Science and Engineering, Shanghai Jiao Tong University, 200030, Shanghai, China
Minyi Guo
Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong, China
Jiannong Cao
Computer Science Department, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, J. (2005). Aggressive Loop Fusion for Improving Locality and Parallelism. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds) Parallel and Distributed Processing and Applications. ISPA 2005. Lecture Notes in Computer Science, vol 3758. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576235_28

Download citation

DOI: https://doi.org/10.1007/11576235_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29769-7
Online ISBN: 978-3-540-32100-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics