Skip to main content

Complexity of Multi-dimensional Loop Alignment

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2285))

Abstract

Loop alignment is a classical program transformation that can enable the fusion of parallel loops, thereby increasing locality and reducing the number of synchronizations. Although the problem is quite old in the one-dimensional case (i.e., no nested loops), it came back recently — with a multi-dimensional form — when trying to refine parallelization algorithms based on multi-dimensional schedules. The main result of this paper is that, unlike the problem in 1D, finding a multi-dimensional shift of statements that makes an innermost loop parallel is strongly NP-complete. Nevertheless, we identify some polynomially-solvable cases that can occur in practice and we show that the general problem can be stated as a system of integer linear constraints.

Shifting is sometimes called alignment. But we prefer to make a distinction between shifting, a technique that consists in shifting some statements with respect to the loop iterations, and alignment, which is one particular goal achieved by this technique.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Allen, D. Callahan, and K. Kennedy. Automatic decomposition of scientific programs for parallel execution. In PoPL’87, pages 63–76, Munich, Jan. 1987.

    Google Scholar 

  2. P. Boulet, A. Darte, G.-A. Silber, and F. Vivien. Loop parallelization algorithms: Fromparallelismextraction to code generation. Journal of Parallel Computing, 24(3), 1998. Special issue on Languages and Compilers for Parallel Computers.

    Google Scholar 

  3. J.-F. Collard, P. Feautrier, and T. Risset. Construction of DO loops from systems of affine constraints. Parallel Processing Letters (PPL), 5(3):421–436, Sept. 1995.

    Article  Google Scholar 

  4. A. Darte. On the complexity of loop fusion. In PACT’99, pages 149–157, Newport Beach, CA, Oct. 1999.

    Google Scholar 

  5. A. Darte and G. Huard. Loop shifting for loop parallelization. Technical Report RR2000-22, LIP, ENS-Lyon, France, May 2000.

    Google Scholar 

  6. A. Darte, Y. Robert, and F. Vivien. Scheduling and Automatic Parallelization. Birkhauser, 2000. ISBN 0-8176-4149-1.

    Google Scholar 

  7. A. Darte, G.-A. Silber, and F. Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. PPL, 7(4):379–392, 1997.

    Google Scholar 

  8. P. Feautrier. Some efficient solutions to the affine scheduling problem, Part II. Internationl Journal of Parallel Programming, 21(6):389–420, Dec. 1992.

    Google Scholar 

  9. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, 1979.

    Google Scholar 

  10. W. Kelly and B. Pugh. Selecting affine mappings based on performance estimation. Parallel Processing Letters, 4(3):205–219, Sept. 1994.

    Article  Google Scholar 

  11. W. Kelly, W. Pugh, and E. Rosser. Code generation for multiple mappings. In Frontiers of Massively Parallel Computation, pages 332–341, McLean, 1995.

    Google Scholar 

  12. K. Kennedy and K. S. McKinley. Loop distribution with arbitrary control flow. In Supercomputing’90, Aug. 1990.

    Google Scholar 

  13. L. Lamport. The parallel execution of DO loops. Communications of the ACM, 17(2):83–93, Feb. 1974.

    Article  MATH  MathSciNet  Google Scholar 

  14. C. Lang, N. Passos, and E. Sha. Polynomial-time nested loop fusion with full parallelism. In ICPP’96, volume 3, pages 9–16, Bloomingdale, IL, 1996.

    Google Scholar 

  15. C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1):5–35, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  16. A. W. Lim and M. S. Lam. Maximizing parallelism and minimizing synchronization with affine transforms. In PoPL’97. ACM Press, Jan. 1997.

    Google Scholar 

  17. K. S. McKinley and K. Kennedy. Maximizing loop parallelism and improving data locality via loop fusion and distribution. LCPC’93, volume 768 of Lecture Notes in Computer Science, pages 301–320. Springer-Verlag, 1993.

    Google Scholar 

  18. K. Okuda. Cycle shrinking by dependence reduction. In Euro-Par’96, volume 1123 of Lecture Notes in Computer Science, pages 398–401. Springer-Verlag, 1996.

    Google Scholar 

  19. J. K. Peir. Program Partitioning and Synchronization on Multiprocessor Systems. PhD thesis, University of Illinois at Urbana-Champaign, 1986.

    Google Scholar 

  20. J. K. Peir and R. Cytron. Minimum distance: A method for partitioning recurrences for multiprocessors. IEEE Trans. on Computers, 38(8):1203–1211, Aug. 1989.

    Article  Google Scholar 

  21. F. Quilleré, S. Rajopadhye, and D. Wilde. Generation of efficient nested loops from polyhedra. International Journal of Parallel Programming, 28(5):469–498, 2000.

    Article  Google Scholar 

  22. R. Schreiber, S. Aditya, B. R. Rau, V. Kathail, S. Mahlke, S. Abraham, and G. Snider. High-level synthesis of nonprogrammable hardware accelerators. In Application-Specific Systems, Architectures, and Processors, pages 113–124, 2000.

    Google Scholar 

  23. M. E. Wolf and M. S. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE TPDS, 2(4):452–471, Oct. 1991.

    Google Scholar 

  24. M. Wolfe. High Performance Compilers for Parallel Computing. Addison-Wesley Publishing Company, 1996.

    Google Scholar 

  25. H. Zima and B. Chapman. Supercompilers for Parallel and Vector Computers. ACM Press, 1990.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Darte, A., Huard, G. (2002). Complexity of Multi-dimensional Loop Alignment. In: Alt, H., Ferreira, A. (eds) STACS 2002. STACS 2002. Lecture Notes in Computer Science, vol 2285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45841-7_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-45841-7_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43283-8

  • Online ISBN: 978-3-540-45841-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics