Summary
This chapter is devoted to a comparative survey of loop parallelization algorithms. Various algorithms have been presented in the literature, such as those introduced by Allen and Kennedy, Wolf and Lam, Darte and Vivien, and Feautrier. These algorithms make use of different mathematical tools. Also, they do not rely on the same representation of data dependences. In this chapter, we survey each of these algorithms, and we assess their power and limitations, both through examples and by stating “optimality” results. An important contribution of this chapter is to characterize which algorithm is the most suitable for a given representation of dependences. This result is of practical interest, as it provides guidance for a compiler-parallelizer: given the dependence analysis that is available, the simplest and cheapest parallelization algorithm that remains optimal should be selected.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
J. R. Allen and K. Kennedy. PFC: a program to convert programs to parallel form. Technical report, Dept. of Math. Sciences, Rice University, TX, March 1982.
J. R. Allen and K. Kennedy. Automatic translations of Fortran programs to vector form. ACM Toplas, 9:491–542, 1987.
Utpal Banerjee. A theory of loop permutations. In Gelernter, Nicolau, and Padua, editors, Languages and Compilers for Parallel Computing. MIT Press, 1990.
A. J. Bernstein. Analysis of programs for parallel processing. In IEEE Trans. on El. Computers, EC-15, 1966.
Pierre Boulet, Alain Darte, Tanguy Risset, and Yves Robert. (pen)-ultimate tiling? Integration, the VLSI Journal, 17:33–51, 1994.
D. Callahan. A Global Approach to Detection of Parallelism. PhD thesis, Dept. of Computer Science, Rice University, Houston, TX, 1987.
J.-F. Collard, D. Barthou, and P. Feautrier. Fuzzy Array Dataflow Analysis. In Proceedings of 5th ACM SIGPLAN Symp. on Principles and practice of Parallel Programming, Santa Barbara, CA, July 1995.
Jean-François Collard. Code generation in automatic parallelizers. In Claude Girault, editor, Proc. Int. Conf. on Application in Parallel and Distributed Computing. IFIP WG 10.3, pages 185–194. North Holland, April 1994.
Jean-François Collard, Paul Feautrier, and Tanguy Risset. Construction of DO loops from systems of affine constraints. Parallel Processing Letters, 5(3):421–436, September 1995.
Alain Darte, Leonid Khachiyan, and Yves Robert. Linear scheduling is nearly optimal. Parallel Processing Letters, 1(2):73–81, 1991.
Alain Darte and Yves Robert. Mapping uniform loop nests onto distributed memory architectures. Parallel Computing, 20:679–710, 1994.
Alain Darte and Yves Robert. Affine-by-statement scheduling of uniform and affine loop nests over parametric domains. J. Parallel and Distributed Computing, 29:43–59, 1995.
Alain Darte, Georges-André Silber, and Frédéric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Technical Report 96-34, LIP, ENS-Lyon, France, November 1996.
Alain Darte and Frédéric Vivien. Automatic parallelization based on multidimensional scheduling. Technical Report 94-24, LIP, ENS-Lyon, France, September 1994.
Alain Darte and Frédéric Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. In Proceedings of PACT’96, Boston, MA, October 1996. IEEE Computer Society Press.
Alain Darte and Frédéric Vivien. Optimal fine and medium grain parallelism detection in polyhedral reduced dependence graphs. Technical Report 96-06, LIP, ENS-Lyon, France, April 1996.
Alain Darte and Frédéric Vivien. On the optimality of Allen and Kennedy’s algorithm for parallelism extraction in nested loops. Journal of Parallel Algorithms and Applications, 96. Special issue on Optimizing Compilers for Parallel Languages.
Paul Feautrier. Dataflow analysis of array and scalar references. Int. J. Parallel Programming, 20(1):23–51, 1991.
Paul Feautrier. Some efficient solutions to the affine scheduling problem, part I, one-dimensional time. Int. J. Parallel Programming, 21(5):313–348, October 1992.
Paul Feautrier. Some efficient solutions to the affine scheduling problem, part II, multi-dimensional time. Int. J. Parallel Programming, 21(6):389–420, December 1992.
F. Irigoin, P. Jouvelot, and R. Triolet. Semantical interprocedural parallelization: an overview of the PIPS project. In Proceedings of the 1991 ACM International Conference on Supercomputing, Cologne, Germany, June 1991.
F. Irigoin and R. Triolet. Computing dependence direction vectors and dependence cones with linear systems. Technical Report ENSMP-CAI-87-E94, Ecole des Mines de Paris, Fontainebleau (France), 1987.
F. Irigoin and R. Triolet. Supernode partitioning. In Proc. 15th Annual ACM Symp. Principles of Programming Languages, pages 319–329, San Diego, CA, January 1988.
R.M. Karp, R.E. Miller, and S. Winograd. The organization of computations for uniform recurrence equations. Journal of the ACM, 14(3):563–590, July 1967.
W. Kelly, V. Maslov, W. Pugh, E. Rosser, T. Shpeisman, and D. Wonnacott. New user interface for Petit and other interfaces: user guide. University of Maryland, June 1995.
Leslie Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, February 1974.
Amy W. Lim and Monica S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In Proceedings of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1997.
Wolfgang Meisl. Practical methods for scheduling and allocation in the polytope model. World Wide Web document, URL: http://brahms.fmi.uni-passau.de/cl/loopo/doc.
R. Schreiber and Jack J. Dongarra. Automatic blocking of nested loops. Technical Report 90-38, The University of Tennessee, Knoxville, TN, August 1990.
Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley and Sons, New York, 1986.
Michael E. Wolf and Monica S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distributed Systems, 2(4):452–471, October 1991.
M. Wolfe. Optimizing Supercompilers for Supercomputers. PhD thesis, Dept. of Computer Science, University of Illinois at Urbana-Champaign, October 1982.
Michael Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge MA, 1989.
Michael Wolfe. TINY, a loop restructuring research tool. Oregon Graduate Institute of Science and Technology, December 1990.
Michael Wolfe. High Performance Compilers For Parallel Computing. Addison-Wesley Publishing Company, 1996.
Jingling Xue. Automatic non-unimodular transformations of loop nests. Parallel Computing, 20(5):711–728, May 1994.
Hans Zima and Barbara Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Darte, A., Robert, Y., Vivien, F. (2001). Loop Parallelization Algorithms. In: Pande, S., Agrawal, D.P. (eds) Compiler Optimizations for Scalable Parallel Systems. Lecture Notes in Computer Science, vol 1808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45403-9_5
Download citation
DOI: https://doi.org/10.1007/3-540-45403-9_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41945-7
Online ISBN: 978-3-540-45403-8
eBook Packages: Springer Book Archive