Skip to main content

Loop Parallelization Algorithms

  • Chapter
  • First Online:
Compiler Optimizations for Scalable Parallel Systems

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1808))

Summary

This chapter is devoted to a comparative survey of loop parallelization algorithms. Various algorithms have been presented in the literature, such as those introduced by Allen and Kennedy, Wolf and Lam, Darte and Vivien, and Feautrier. These algorithms make use of different mathematical tools. Also, they do not rely on the same representation of data dependences. In this chapter, we survey each of these algorithms, and we assess their power and limitations, both through examples and by stating “optimality” results. An important contribution of this chapter is to characterize which algorithm is the most suitable for a given representation of dependences. This result is of practical interest, as it provides guidance for a compiler-parallelizer: given the dependence analysis that is available, the simplest and cheapest parallelization algorithm that remains optimal should be selected.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. R. Allen and K. Kennedy. PFC: a program to convert programs to parallel form. Technical report, Dept. of Math. Sciences, Rice University, TX, March 1982.

    Google Scholar 

  2. J. R. Allen and K. Kennedy. Automatic translations of Fortran programs to vector form. ACM Toplas, 9:491–542, 1987.

    Article  MATH  Google Scholar 

  3. Utpal Banerjee. A theory of loop permutations. In Gelernter, Nicolau, and Padua, editors, Languages and Compilers for Parallel Computing. MIT Press, 1990.

    Google Scholar 

  4. A. J. Bernstein. Analysis of programs for parallel processing. In IEEE Trans. on El. Computers, EC-15, 1966.

    Google Scholar 

  5. Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33–51, 1994.

    Article  Google Scholar 

  6. D. Callahan. A Global Approach to Detection of Parallelism. PhD thesis, Dept. of Computer Science, Rice University, Houston, TX, 1987.

    Google Scholar 

  7. J.-F. Collard, D. Barthou, and P. Feautrier. Fuzzy Array Dataflow Analysis. In Proceedings of 5th ACM SIGPLAN Symp. on Principles and practice of Parallel Programming, Santa Barbara, CA, July 1995.

    Google Scholar 

  8. Jean-François Collard. Code generation in automatic parallelizers. In Claude Girault, editor, Proc. Int. Conf. on Application in Parallel and Distributed Computing. IFIP WG 10.3, pages 185–194. North Holland, April 1994.

    Google Scholar 

  9. Jean-François Collard, Paul Feautrier, and Tanguy Risset. Construction of DO loops from systems of affine constraints. Parallel Processing Letters, 5(3):421–436, September 1995.

    Article  Google Scholar 

  10. Alain Darte, Leonid Khachiyan, and Yves Robert. Linear scheduling is nearly optimal. Parallel Processing Letters, 1(2):73–81, 1991.

    Article  Google Scholar 

  11. Alain Darte and Yves Robert. Mapping uniform loop nests onto distributed memory architectures. Parallel Computing, 20:679–710, 1994.

    Article  MATH  Google Scholar 

  12. Alain Darte and Yves Robert. Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. J. Parallel and Distributed Computing, 29:43–59, 1995.

    Article  Google Scholar 

  13. Alain Darte, Georges-André Silber, and Frédéric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, LIP, ENS-Lyon, France, November 1996.

    Google Scholar 

  14. Alain Darte and Frédéric Vivien. Automatic parallelization based on multidimensional scheduling. Technical Report 94-24, LIP, ENS-Lyon, France, September 1994.

    Google Scholar 

  15. Alain Darte and Frédéric Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In Proceedings of PACT’96, Boston, MA, October 1996. IEEE Computer Society Press.

    Google Scholar 

  16. Alain Darte and Frédéric Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Technical Report 96-06, LIP, ENS-Lyon, France, April 1996.

    Google Scholar 

  17. Alain Darte and Frédéric Vivien. On the optimality of Allen and Kennedy’s algorithm for parallelism extraction in nested loops. Journal of Parallel Algorithms and Applications, 96. Special issue on Optimizing Compilers for Parallel Languages.

    Google Scholar 

  18. Paul Feautrier. Dataflow analysis of array and scalar references. Int. J. Parallel Programming, 20(1):23–51, 1991.

    Article  MATH  Google Scholar 

  19. Paul Feautrier. Some efficient solutions to the affine scheduling problem, part I, one-dimensional time. Int. J. Parallel Programming, 21(5):313–348, October 1992.

    Article  MATH  MathSciNet  Google Scholar 

  20. Paul Feautrier. Some efficient solutions to the affine scheduling problem, part II, multi-dimensional time. Int. J. Parallel Programming, 21(6):389–420, December 1992.

    Article  MATH  MathSciNet  Google Scholar 

  21. F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: an overview of the PIPS project. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.

    Google Scholar 

  22. F. Irigoin and R. Triolet. Computing dependence direction vectors and dependence cones with linear systems. Technical Report ENSMP-CAI-87-E94, Ecole des Mines de Paris, Fontainebleau (France), 1987.

    Google Scholar 

  23. F. Irigoin and R. Triolet. Supernode partitioning. In Proc. 15th Annual ACM Symp. Principles of Programming Languages, pages 319–329, San Diego, CA, January 1988.

    Google Scholar 

  24. R.M. Karp, R.E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. Journal of the ACM, 14(3):563–590, July 1967.

    Article  MATH  MathSciNet  Google Scholar 

  25. W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. New user interface for Petit and other interfaces: user guide. University of Maryland, June 1995.

    Google Scholar 

  26. Leslie Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, February 1974.

    Article  MATH  MathSciNet  Google Scholar 

  27. Amy W. Lim and Monica S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1997.

    Google Scholar 

  28. Wolfgang Meisl. Practical methods for scheduling and allocation in the polytope model. World Wide Web document, URL: http://brahms.fmi.uni-passau.de/cl/loopo/doc.

  29. R. Schreiber and Jack J. Dongarra. Automatic blocking of nested loops. Technical Report 90-38, The University of Tennessee, Knoxville, TN, August 1990.

    Google Scholar 

  30. Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, New York, 1986.

    MATH  Google Scholar 

  31. Michael E. Wolf and Monica S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distributed Systems, 2(4):452–471, October 1991.

    Article  Google Scholar 

  32. M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, October 1982.

    Google Scholar 

  33. Michael Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge MA, 1989.

    MATH  Google Scholar 

  34. Michael Wolfe. TINY, a loop restructuring research tool. Oregon Graduate Institute of Science and Technology, December 1990.

    Google Scholar 

  35. Michael Wolfe. High Performance Compilers For Parallel Computing. Addison-Wesley Publishing Company, 1996.

    Google Scholar 

  36. Jingling Xue. Automatic non-unimodular transformations of loop nests. Parallel Computing, 20(5):711–728, May 1994.

    Article  MATH  MathSciNet  Google Scholar 

  37. Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Darte, A., Robert, Y., Vivien, F. (2001). Loop Parallelization Algorithms. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-45403-9_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41945-7

  • Online ISBN: 978-3-540-45403-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics