Skip to main content

Aggressive Loop Fusion for Improving Locality and Parallelism

  • Conference paper
Parallel and Distributed Processing and Applications (ISPA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3758))

Abstract

Existing loop fusion algorithms fuse loop nests only when the dependences in the loop nests are not violated. This paper presents a new algorithm that is capable of fusing loop nests in the presence of fusion-preventing anti-dependences. We eliminate all these violated dependences by automatic array copying. In this work, such an aggressive loop fusion strategy is applied to a Jacobi program. The performance of such iterative methods is typically limited by the speed of the memory system. Fusing the two loop nests in the Jacobi program into one reduces data cache misses, and consequently, improves the performance results of both sequential and parallel versions of the Jacobi program, as validated by our experimental results on an HP AlphaServer SC45 supercomputer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anderson, E., Bai, Z., Bischof, C., Blackford, S., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., Sorensen, D.: LAPACK User’s Guide, 3rd edn. SIAM, Philadelphia (1999)

    Book  MATH  Google Scholar 

  2. The OpenMP Architecture Review Boards (ARB), http://www.openmp.org

  3. Darte, A.: On the complexity of loop fusion. Parallel Computing 29(6), 1175–1193 (2000)

    Article  MathSciNet  Google Scholar 

  4. Feautrier, P.: Parametric integer programming. Operations Research 22, 243–268 (1988)

    MATH  MathSciNet  Google Scholar 

  5. Feautrier, P.: Dataflow analysis for array and scalar references. Int. J. of Parallel Programming 20(1), 23–53 (1991)

    Article  MATH  Google Scholar 

  6. Manjikian, N., Abdelrahman, T.S.: Fusion of loops for parallelism and locality. IEEE Trans. on Parallel and Distributed Systems 8(2), 193–209 (1997)

    Article  Google Scholar 

  7. Pugh, W.: The Omega test: A fast and practical integer programming algorithm for dependence analysis. Comm. ACM 35(8), 102–114 (1992)

    Article  Google Scholar 

  8. Song, Y., Li, Z.: New tiling techniques to improve cache temporal locality. In: ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI 1999), May 1999, pp. 215–228 (1999)

    Google Scholar 

  9. Wolfe, M.J.: More iteration space tiling. In: Supercomputing 1988, November 1989, pp. 655–664 (1989)

    Google Scholar 

  10. Wolfe, M.J.: High Performance Compilers for Parallel Computing. Addison-Wesley, Reading (1996)

    MATH  Google Scholar 

  11. Xue, J.: On tiling as a loop transformation. Parallel Processing Letters 7(4), 409–424 (1997)

    Article  MathSciNet  Google Scholar 

  12. Xue, J.: Loop Tiling for Parallelism. Kluwer Academic Publishers, Boston (2000)

    MATH  Google Scholar 

  13. Xue, J., Huang, Q., Guo, M.: Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences. In: International Conference on Parallel Processing (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xue, J. (2005). Aggressive Loop Fusion for Improving Locality and Parallelism. In: Pan, Y., Chen, D., Guo, M., Cao, J., Dongarra, J. (eds) Parallel and Distributed Processing and Applications. ISPA 2005. Lecture Notes in Computer Science, vol 3758. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11576235_28

Download citation

  • DOI: https://doi.org/10.1007/11576235_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29769-7

  • Online ISBN: 978-3-540-32100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics