An Unfolding-Based Loop Optimization Technique

  • Litong Song
  • Krishna Kavi
  • Ron Cytron
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2826)


Loops in programs are the source of many optimizations for improving program performance, particularly on modern high-performance architectures as well as vector and multithreaded systems. Techniques such as loop invariant code motion, loop unrolling and loop peeling have demonstrated their utility in compiler optimizations. However, many of these techniques can only be used in very limited cases when the loops are ”well-structured” and easy to analyze. For instance, loop invariant code motion works only when invariant code is inside loops; loop unrolling and loop peeling work effectively when the array references are either constants or affine functions of index variable. It is our contention that there are many opportunities overlooked by limiting the optimizations to well structured loops. In many cases, even ”badly-structured” loops may be transformed into well structured loops. As a case in point, we show how some loop-dependent code can be transformed into loop-invariant code by transforming the loops. Our technique described in this paper relies on unfolding the loop for several initial iterations such that more opportunities may be exposed for many other existing compiler optimization techniques such as loop invariant code motion, loop peeling, loop unrolling, and so on.


Affine Function Control Dependence Compiler Optimization Instruction Level Parallelism Dependence Edge 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)Google Scholar
  2. 2.
    Allen, R., Kennedy, K.: Optimization Compilers for Modern Architectures. Morgan Kaufmann Publishers, San Francisco (2002)Google Scholar
  3. 3.
    August, D.I.: Hyperblock performance optimizations for ILP processors, M.S. thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL (1996)Google Scholar
  4. 4.
    Bacon, D.F., Graham, S.L.: Compiler transformations for high-performance computing. ACM Computing Surveys 26(4), 345–420 (1994)CrossRefGoogle Scholar
  5. 5.
    Banerjee, U.: An introduction to a formal theory of dependence analysis. Journal of Supercomput 2(2), 133–149 (1988)CrossRefGoogle Scholar
  6. 6.
    Bodik, R., Gupta, R., Soffa, M.L.: Complete removal of redundant expressions. In: Prod. ACM Conf. On Programming Language Design and Implementation, pp. 1–14. ACM Press, New York (1998)Google Scholar
  7. 7.
    Bulyonkov, M.A., Kochetov, D.V.: Practical aspects of specialization of Algol-like programs. In: Danvy, O., Thiemann, P., Glück, R. (eds.) Dagstuhl Seminar 1996. LNCS, vol. 1110, pp. 17–32. Springer, Heidelberg (1996)Google Scholar
  8. 8.
    Cocke, J., Schwartz, J.T.: Programming languages and their compilers (preliminarynotes). 2nd Courant Institute of Mathematical Science. New York University, New YorkGoogle Scholar
  9. 9.
    Cytron, R., Ferrante, J.: Efficiently computing static single assignment form and the control dependence graph. ACM TOPLAS 13(4), 451–490 (1991)CrossRefGoogle Scholar
  10. 10.
    Cytron, R., Lowry, A., Zadeck, F.K.: Code motion of control structures in high-level languages. In: Conference Record of the 13th ACM Symposium on Principle of Programming Languages, pp. 70–85. ACM Press, New York (1986)Google Scholar
  11. 11.
    Dongarra, J., Hind, A.R.: Unrolling loops in Fortran. Softw. Pract. Exper. 9(3), 219–226 (1979)zbMATHCrossRefGoogle Scholar
  12. 12.
    Ellis, J.R.: Building: A Compiler for VLIW Architecture. In: ACM Doctoral Dissertation Award. MIT Press, Cambridge (1986)Google Scholar
  13. 13.
    Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345 (1962)CrossRefGoogle Scholar
  14. 14.
    Kavi, K.M., Giorgi, R., Arul, J.: Scheduled Dataflow: Execution paradigm, architecture and performance evaluation. IEEE Transactions on Computer 50(8), 834–846 (2001)Google Scholar
  15. 15.
    Lin, D.C.: Compiler support for predicated execution in superscalar processors. M.S.thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL (1992)Google Scholar
  16. 16.
    Mahlke, S.A.: Exploiting instruction level parallelism in the presence of conditional branches. Ph.D. thesis, Department of Electrical and Computer Engineering, University of Illinois, Urbana, IL (1995)Google Scholar
  17. 17.
    Metzger, R., Stroud, S.: Interprocedual constant propagation: An empirical study. ACM Letters on Programming Languages and Systems 2(1), 213–232 (1993)CrossRefGoogle Scholar
  18. 18.
    Padua, D.A., Wolfe, M.J.: Advanced compiler optimizations for supercomputers. Communications of the ACM 29(12), 1184–1201 (1986)CrossRefGoogle Scholar
  19. 19.
    Pande, S., Agrawal, D.P. (eds.): Compiler Optimizations for Scalable Parallel Systems. LNCS, vol. 1808. Springer, Heidelberg (1998)Google Scholar
  20. 20.
    Rosen, B.K., Wegman, M.N., Zadeck, F.K.: Global value numbers and redundant computations. In: Conference Record of the 15th ACM Symposium on Principles of Programming Languages, pp. 12–27. ACM Press, New York (1988)Google Scholar
  21. 21.
    Song, L.: Studies on Termination Methods of Partial Evaluation. Ph.D. thesis, Department of Computer Science, Waseda University, Tokyo, Japan (2001)Google Scholar
  22. 22.
    Steffen, B.: Property oriented expansion. In: Cousot, R., Schmidt, D.A. (eds.) SAS 1996. LNCS, vol. 1145, pp. 22–41. Springer, Heidelberg (1996)Google Scholar
  23. 23.
    Steffen, B., Knoop, J., Rüthing, O.: The value flow graph: A program representation for optimal program transformations. In: Jones, N.D. (ed.) ESOP 1990. LNCS, vol. 432, pp. 389–405. Springer, Heidelberg (1990)Google Scholar
  24. 24.
    Warshall, S.: A theorem on Boolean matrices. Journal of the ACM 9(1), 11–12 (1962)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Wolfe, M.J.: Optimizing supercompilers for supercomputers. In: Research Monographs in Parallel and Distributed Computing, MIT Press, CambridgeGoogle Scholar
  26. 26.
    Wolfe, M.J.: High performance compilers for parallel computing. Addison-Wesley Publishing Company, Inc., Reading (1996)zbMATHGoogle Scholar
  27. 27.
    Zima, H., Chapman, B.: Supercompiler for parallel and vector computers. Frontier, Series. ACM Press, New York (1990)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Litong Song
    • 1
  • Krishna Kavi
    • 1
  • Ron Cytron
    • 2
  1. 1.Department of Computer ScienceUniversity of North TexasDentonUSA
  2. 2.Department of Computer Science and EngineeringWashington UniversitySt. LouisUSA

Personalised recommendations