Loop Shifting for Loop Compaction

  • Alain Darte
  • Guillaume Huard
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1863)


The idea of decomposed software pipelining is to decouple the software pipelining problem into a cyclic scheduling problem without resource constraints and an acyclic scheduling problem with resource constraints. In terms of loop transformation and code motion, the technique can be formulated as a combination of loop shifting and loop compaction. Loop shifting amounts to move statements between iterations thereby changing some loop independent dependences into loop carried dependences and vice versa. Then, loop compaction schedules the body of the loop considering only loop independent dependences, but taking into account the details of the target architecture. In this paper, we show how loop shifting can be optimized so as to minimize both the length of the critical path and the number of dependences for loop compaction. Both problems (and the combination) are polynomially solvable with fast graph algorithms, the first one by using an algorithm due to Leiserson and Saxe, the second one by designing a variant of minimum-cost flow algorithms.


Critical Path Dependence Graph Clock Period List Schedule Loop Body 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alexander Aiken and Alexandru Nicolau. Perfect pipelining; a new loop optimization technique. In 1988 European Symposium on Programming, volume 300 of Lecture Notes in Computer Science, pages 221–235. Springer Verlag, 1988.Google Scholar
  2. 2.
    Vicki H. Allan, Reese B. Jones, Randall M. Lee, and Stephen J. Allan. Software pipelining. ACM Computing Surveys, 27(3):367–432, September 1995.Google Scholar
  3. 3.
    Tsing-Fa Lee Allen, C.-H Wu Wei-Jeng Chen, Wei-Kai Cheng, and Youn-Long Lin. On the relationship between sequential logic retiming and loop folding. In Proceedings of the SASIMI’93, pages 384–393, Nara, Japan, October 1993.Google Scholar
  4. 4.
    P.Y. Calland, A. Darte, and Y. Robert. Circuit retiming applied to decomposed software pipelining. IEEE Transactions on Parallel and Distributed Systems, 9(1):24–35, January 1998.Google Scholar
  5. 5.
    L.-F. Chao, A. LaPaugh, and E. Sha. Rotation scheduling: a loop pipelining algorithm. In 30th ACM/IEEE design automation conference, pages 566–572, 1993.Google Scholar
  6. 6.
    E. G. Coffman. Computer and job-shop scheduling theory. John Wiley, 1976.Google Scholar
  7. 7.
    Alain Darte and Guillaume Huard. Loop shifting for loop compaction. Technical Report RR1999-29, LIP, ENS-Lyon, France, May 1999.Google Scholar
  8. 8.
    Alain Darte, Georges-André Silber, and Frédéric Vivien. Combining retiming and scheduling techniques for loop parallelization and loop tiling. Parallel Processing Letters, 7(4):379–392, 1997.CrossRefGoogle Scholar
  9. 9.
    Carole Dulong. The IA-64 architecture at work. Computer, 31(7):24–32, July 1998.Google Scholar
  10. 10.
    F. Gasperoni and U. Schwiegelshohn. Generating close to optimum loop schedules on parallel processors. Parallel Processing Letters, 4(4):391–403, 1994.CrossRefMathSciNetGoogle Scholar
  11. 11.
    M. Gondran and M. Minoux. Graphs and Algorithms. John Wiley, 1984.Google Scholar
  12. 12.
    C. Hanen and A. Munier. Cyclic scheduling on parallel processors: an overview. In P. Chrétienne, E. G. Coffman, J. K. Lenstra, and Z. Liu, editors, Scheduling Theory and its Applications. John Wiley & Sons Ltd, 1995.Google Scholar
  13. 13.
    John L. Hennessy and David A. Patterson. Computer architecture: a quantitative approach (2nd edition). Morgan-Kaufmann, 1996.Google Scholar
  14. 14.
    R. A. Huff. Lifetime-sensitive modulo scheduling. In Conference on programming language design and implementation (PLDI’93), pages 258–267. ACM, 1993.Google Scholar
  15. 15.
    Suneel Jain. Circular scheduling. In Conference on programming language design and implementation (PLDI’91), pages 219–228. ACM, 1991.Google Scholar
  16. 16.
    Monica S. Lam. Software pipelining; an effective scheduling technique for VLIW machines. In SIGPLAN’88 Conference on Programming Language, Design and Implementation, pages 318–328, Atlanta, GA, 1988. ACM Press.Google Scholar
  17. 17.
    C.E. Leiserson and J.B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1):5–35, 1991.zbMATHCrossRefMathSciNetGoogle Scholar
  18. 18.
    J. Llosa, A. González, E. Ayguadé, and M. Valero. Swing modulo scheduling: a lifetime-sensitive approach. In Conference on Parallel Architectures and Compilation Techniques (PACT’96), Boston, MA, 1996. IEEE Computer Society Press.Google Scholar
  19. 19.
    Soo-Mook Moon and Kemal Ebcioğlu. An efficient resource-constrained global scheduling technique for superscalar and VLIW processors. In 25th annual international symposium on Microarchitecture, pages 55–71, 1992.Google Scholar
  20. 20.
    Andreas I. Moshovos, Scott E. Breach, T.N. Vijaykumar, and Gurindar S. Sohi. Dynamic speculation and synchronization of data dependences. In 24th International Symposium on Computer Architecture (ISCA’97), pages 181–193, Jun 1997.Google Scholar
  21. 21.
    M. Rajagopalan and V. H. Allan. Specification of software pipelining using petri nets. International Journal of Parallel Programming, 22(3):273–301, 1994.CrossRefGoogle Scholar
  22. 22.
    B. R. Rau. Iterative modulo scheduling: an algorithm for software pipelining. In 27th annual international symposium on Microarchitecture, pages 63–74, 1994.Google Scholar
  23. 23.
    B. R. Rau. Iterative modulo scheduling. International Journal of Parallel Programming, 24(1):3–64, 1996.MathSciNetGoogle Scholar
  24. 24.
    B. R. Rau and C. D. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proceedings of the Fourteenth Annual Workshop of Microprogramming, pages 183–198, October 1981.Google Scholar
  25. 25.
    U. Schwiegelshohn, F. Gasperoni, and K. Ebcioğlu. On optimal parallelization of arbitrary loops. Journal of Parallel and Distributed Computing, 11:130–134, 1991.CrossRefGoogle Scholar
  26. 26.
    Georges-André Silber and Alain Darte. The Nestor library: A tool for implementing Fortran source to source transformations. In High Performance Computing and Networking (HPCN’99), volume 1593 of Lecture Notes in Computer Science, pages 653–662. Springer Verlag, April 1999.CrossRefGoogle Scholar
  27. 27.
    J. Wang, C. Eisenbeis, M. Jourdan, and B. Su. Decomposed software pipelining. International Journal of Parallel Programming, 22(3):351–373, 1994.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Alain Darte
    • 1
  • Guillaume Huard
    • 1
  1. 1.LIPENS-LyonLyonFrance

Personalised recommendations