Skip to main content

Maximizing loop parallelism and improving data locality via loop fusion and distribution

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 1993)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 768))

Abstract

Loop fusion is a program transformation that merges multiple loops into one. It is effective for reducing the synchronization overhead of parallel loops and for improving data locality. This paper presents three results for fusion: (1) a new algorithm for fusing a collection of parallel and sequential loops, minimizing parallel loop synchronization while maximizing parallelism; (2) a proof that performing fusion to maximize data locality is NP-hard; and (3) two polynomial-time algorithms for improving data locality. These techniques also apply to loop distribution, which is shown to be essentially equivalent to loop fusion. Our approach is general enough to support other fusion heuristics. Preliminary experimental results validate our approach for improving performance by exploiting data locality and increasing the granularity of parallelism.

This research was supported by the Center for Research on Parallel Computation, a NSF Science and Technology Center. Use of the Sequent Symmetry S81 was provided under NSF Cooperative Agreement No. CDA-8619393.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Abu-Sufah. Improving the Performance of Virtual Memory Computers. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, 1979.

    Google Scholar 

  2. F. Allen and J. Cocke. A catalogue of optimizing transformations. In J. Rustin, editor, Design and Optimization of Compilers. Prentice-Hall, 1972.

    Google Scholar 

  3. J. R. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In Proceedings of the Fourteenth Annual ACM Symposium on the Principles of Programming Languages, Munich, Germany, Jan. 1987.

    Google Scholar 

  4. J. R. Allen and K. Kennedy. Automatic translation of Fortran programs to vector form. ACM Transactions on Programming Languages and Systems, 9(4):491–542, Oct. 1987.

    Article  Google Scholar 

  5. A. J. Bernstein. Analysis of programs for parallel processing. IEEE Transactions on Electronic Computers, 15(5):757–763, Oct. 1966.

    Google Scholar 

  6. D. Callahan. A Global Approach to Detection of Parallelism. PhD thesis, Dept. of Computer Science, Rice University, Mar. 1987.

    Google Scholar 

  7. D. Callahan, S. Carr, and K. Kennedy. Improving register allocation for sub-scripted variables. In Proceedings of the SIGPLAN '90 Conference on Program Language Design and Implementation, White Plains, NY, June 1990.

    Google Scholar 

  8. S. Carr, K. Kennedy, K. S. McKinley, and C. Tseng. Compiler optimizations for improving data locality. Technical Report TR92-195, Dept. of Computer Science, Rice University, Nov. 1992.

    Google Scholar 

  9. G. Cybenko, L. Kipp, L. Pointer, and D. Kuck. Supercomputer performance evaluation and the Perfect benchmarks. In Proceedings of the 1990 ACM International Conference on Supercomputing, Amsterdam, The Netherlands, June 1990.

    Google Scholar 

  10. R. Cytron, J. Ferrante, and V. Sarkar. Experiences using control dependence in PTRAN. In D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing. The MIT Press, 1990.

    Google Scholar 

  11. E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiway cuts. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, May 1992.

    Google Scholar 

  12. J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319–349, July 1987.

    Article  Google Scholar 

  13. Ford, Jr., L. R. and D. R. Fulkerson. Flows in Networks. Princeton University Press, Princeton, NJ, 1962.

    Google Scholar 

  14. G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, Aug. 1992.

    Google Scholar 

  15. A. Goldberg and R. Paige. Stream processing. In Conference Record of the 1984 ACM Symposium on Lisp and Functional Programming, pages 228–234, Aug. 1984.

    Google Scholar 

  16. A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. Journal of the Association for Computing Machinery, 35(4):921–940, Oct. 1988.

    Google Scholar 

  17. M. W. Hall, K. Kennedy, and K. S. McKinley. Interprocedural transformations for parallel code generation. In Proceedings of Supercomputing '91, Albuquerque, NM, Nov. 1991.

    Google Scholar 

  18. K. Kennedy and K. S. McKinley. Loop distribution with arbitrary control flow. In Proceedings of Supercomputing '90, New York, NY, Nov. 1990.

    Google Scholar 

  19. K. Kennedy and K. S. McKinley. Optimizing for parallelism and data locality. In Proceedings of the 199S ACM International Conference on Supercomputing, Washington, DC, July 1992.

    Google Scholar 

  20. K. Kennedy and K. S. McKinley. Typed fusion with applications to parallel and sequential code generation. Technical Report TR93-208, Dept. of Computer Science, Rice University, Aug. 1993.

    Google Scholar 

  21. K. Kennedy, K. S. McKinley, and C. Tseng. Analysis and transformation in an interactive parallel programming tool. Concurrency: Practice & Experience, to appear 1993.

    Google Scholar 

  22. K. S. McKinley. Automatic and Interactive Parallelization. PhD thesis, Dept. of Computer Science, Rice University, Apr. 1992.

    Google Scholar 

  23. A. Porterfield. Software Methods for Improvement of Cache Performance. PhD thesis, Dept. of Computer Science, Rice University, May 1989.

    Google Scholar 

  24. V. Sarkar and G. Gao. Optimization of array accesses by collective loop transformations. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.

    Google Scholar 

  25. J. Warren. A hierachical basis for reordering transformations. In Conference Record of the Eleventh Annual ACM Symposium on the Principles of Programming Languages, Salt Lake City, UT, Jan. 1984.

    Google Scholar 

  26. M. Yannakakis, P. C. Kanellakis, S. C. Cosmadakis, and C. H. Papadimitriou. Cutting and partitioning a graph after a fixed pattern. Automata, Languages, and Programming — Lecture Notes in Computer Science, 154:712–722, 1983.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Utpal Banerjee David Gelernter Alex Nicolau David Padua

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kennedy, K., McKinley, K.S. (1994). Maximizing loop parallelism and improving data locality via loop fusion and distribution. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D. (eds) Languages and Compilers for Parallel Computing. LCPC 1993. Lecture Notes in Computer Science, vol 768. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57659-2_18

Download citation

  • DOI: https://doi.org/10.1007/3-540-57659-2_18

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-57659-4

  • Online ISBN: 978-3-540-48308-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics