Skip to main content
Log in

Abstract

Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This paper presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. A. Fisher, Trace scheduling: a technique for global microcode compaction, IEEE Transactions on Computers C-30:478–490 (1981).

    Article  Google Scholar 

  2. P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O’Donnell, and J. C. Ruttenberg, The Multiflow trace scheduling compiler, The Journal of Supercomputing 7(1/2):51–142 (1993).

    Article  Google Scholar 

  3. W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing 7(1/2):229–248 (1993).

    Article  Google Scholar 

  4. A. E. Charlesworth, An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 Family, Computer 14(9):18–27 (1981).

    Article  Google Scholar 

  5. K. Ebcioglu, A compilation technique for software pipelining of loops with conditional jumps, Proc. of the 20th Ann. Workshop on Microprogramming, Colorado Springs, Colorado, pp. 69–79 (1987).

    Chapter  Google Scholar 

  6. K. Ebcioglu and T. Nakatani, A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture, In Languages and Compilers for Parallel Computing, D. Gelernter, A. Nicolau, and D. Padua, eds., Pitman/The MIT Press, pp. 213–229 (1989).

    Google Scholar 

  7. S. Jain, Circular scheduling: a new technique to perform software pipelining Proc. of the ACM SIGPLAN ’91 Conf. on Programming Language Design and Implementation, pp. 219–228 (1991).

  8. F. Gasperoni and U. Schwiegeishohn, Scheduling loops on parallel processors: a simple algorithm with close to optimum performance, Proc. of the Int’l. Conf CONPAR ’92, pp. 625–636, (1992).

  9. S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, (1992).

    Google Scholar 

  10. M. Tokoro, T. Takizuka, E. Tamura, and I. Yamaura, A technique of global optimization of microprograms, Proc. of the 11th Ann. Workshop on Microprogramming (Asilomar, California) pp. 41–50 (1978)

    Google Scholar 

  11. A. Aiken and A. Nicolau, A realistic resource-constrained software pipelining algorithm, In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, D. Gelernter, T. Gross and D. Padua, eds., Pitman/The MIT Press, pp. 274–290 (1991).

    Google Scholar 

  12. M. Rajagopalan and V. H. Allan, Efficient scheduling of fine-grain parallelism in loops, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 2–11 (1993).

    Chapter  Google Scholar 

  13. V. H. Allan, U. R. Shah, and K. M. Reddy, Petri Net versus modulo scheduling for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).

    Google Scholar 

  14. T. Muller, Employing finite automata for resource scheduling, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 12–20 (1993).

    Chapter  Google Scholar 

  15. T. A. Proebsting and C. W. Fraser, Detecting pipeline hazards quickly, Proc. of the 21st Ann. ACM Symp. on Principles of Programming Languages, Portland, Oregon, pp. 280–286 (1994).

    Google Scholar 

  16. V. Bala and N. Rubin, Efficient instruction scheduling using finite state automata, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).

    Google Scholar 

  17. B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the 14th Ann. Workshop on Microprogramming pp. 183–198 (1981).

  18. B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled loops, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 158–169. (1992).

    Chapter  Google Scholar 

  19. D. M. Lavery and W. W. Hwu, Unrolling-based optimizations for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).

    Google Scholar 

  20. S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, pp. 45–54 (1992).

  21. S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 217–227 (1994).

    Google Scholar 

  22. B. R. Rau, Data flow and dependence analysis for instruction level parallelism, In Fourth Int’l. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992).

    Chapter  Google Scholar 

  23. J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Super Computing 7(1/2): 181–228 (1993).

    Article  Google Scholar 

  24. D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for Subscripted Variables, Proc. of the ACM SIGPLAN ’90 Conf. on Programming Language Design and Implementation, pp. 53–65 (1990).

  25. E. Duesterwald, R. Gupta, and M. L. Soffa, A practical dataflow framework for array reference analysis and its use in optimizations, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 68–77 (1993).

    Google Scholar 

  26. J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence, Proc. of the Tenth Ann. ACM Symp. on Principles of Programming Languages, pp. 177–189 (1983).

  27. J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett Packard Laboratories, 1991.

    Google Scholar 

  28. B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989).

    Article  Google Scholar 

  29. V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, 1993.

    Google Scholar 

  30. G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: architecture and implementation, The Journal of Supercomputing 7(1/2):143–180 (1993).

    Article  Google Scholar 

  31. P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures Proc. of the Supercomputing ’90, pp. 200–212 (1990).

  32. S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11(4):376–408 (1993).

    Article  Google Scholar 

  33. M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, Proc. of the Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon (1993).

    Google Scholar 

  34. M. S. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann.. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 32–39 (1994).

    Google Scholar 

  35. A. Eichenberger, E. S. Davidson, and S. G. Abraham, Minimum register requirements for a modulo schedule, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture San Jose, California, pp. 75–84 (1994).

    Google Scholar 

  36. M. Lam, Software pipelining: an effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–327 (1988).

  37. B. R. Rau, M. Lee, P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the SIGPLAN ’92 Conf. on Language Design and Implementation, San Francisco (1992).

    Google Scholar 

  38. N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque. New Mexico, pp. 290–299 (1993).

    Google Scholar 

  39. S. Ramakrishnan, Software pipelining in PA-RISC compilers, Hewlett-Packard Journal, pp. 39–45 (1992).

  40. P. Y. T. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois, Urbana-Champaign, 1986.

    Google Scholar 

  41. N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian, Enhanced modulo scheduling for loops with conditional branches, Proc. of the The 25th Ann. Int’l. on Microarchitecture, Portland, Oregon, pp. 170–179. (1992)

    Google Scholar 

  42. N. J. Warter, D. M. Lavery, and W. W. Hwu, The benefit of predicated execution for software pipelining, Proc. of the 26th Ann. Hawaii Int’l. Conf. on System Sciences, Hawaii (1993).

    Google Scholar 

  43. R. A. Huff, Lifetime-sensitive module scheduling, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 258–267 (1993).

    Google Scholar 

  44. J. W. Bockhaus, An Implementation of GURPR*: A Software Pipelining Algorithm, M.S. Thesis. University of Illinois at Urbana-Champaign, 1992.

    Google Scholar 

  45. B. Su and J. Wang, GURPR*: A new global software pipelining algorithm. Proc. of the 24th Ann. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 212–216 (1991).

    Chapter  Google Scholar 

  46. B. R. Rau, Iterative modulo scheduling: an algorithm for software pipelining loops, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 63–74 (1994).

    Google Scholar 

  47. J. Ferrante, K. J. Ottenstein and J. D. Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems 9(3): 319–349 (1987).

    Article  MATH  Google Scholar 

  48. H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachussetts, (1990).

    Google Scholar 

  49. E. S. Davidson, L. E. Shar, A. T. Thomas, and J. H. Patel, Effective control for pipelined computers, Proc. of the COMPCON ’90, San Francisco, pp. 181–184 (1990).

    Google Scholar 

  50. A. E. Eichenberger and E. S. Davidson, A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints, Technical Report CSE-TR-266-95, University of Michigan, (1995).

    Google Scholar 

  51. T. L. Adam, K. M. Chandy and J. R. Dickson, A comparison of list schedules for parallel processing Systems, Comm. ACM 17(12):685–690 (1974).

    Article  MATH  Google Scholar 

  52. W. H. Kohler, A preliminary evaluation of the critical path method for scheduling tasks on multiprocessor Systems, IEEE Trans. on Computers C-24(12): 1235–1238 (1975).

    Article  MathSciNet  MATH  Google Scholar 

  53. J. C. Tiernan, An efficient search algorithm to find the elementary circuits of a graph, Comm. ACM 13:722–726 (1970).

    Article  MathSciNet  MATH  Google Scholar 

  54. P. Mateti and N. Deo, On algorithms for enumerating all circuits of a graph, SIAM Journal of Computing 5(1):90–99 (1976).

    Article  MathSciNet  MATH  Google Scholar 

  55. B. D. de Dinechin, An introduction to simplex scheduling, Proc. of the IFIP WG10.3 Working Conf. on Parallel Architectures and Compilation Techniques, PACT ’94, Montreal, Canada, pp. 327–330 (1994).

    Google Scholar 

  56. B. D. de Dinechin, Simplex scheduling: more than lifetime-sensitive instruction scheduling, Technical Report 1994.22, PRISM, 1994.

    Google Scholar 

  57. E. L. Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and Winston, 1976.

    MATH  Google Scholar 

  58. C. R. Reeves, Modern Heuristic Techniques for Comhinatorial Problems, Halsted Press: an imprint of John Wiley & Sons, Inc., New York, (1993).

    MATH  Google Scholar 

  59. D. Jacobs, P. Siegel, and K. Wilson, Monte Carlo techniques in code optimization. Proc. of the 15th Ann. Int’l. Symp. on Microprogramming (1982).

  60. S. Beaty, Genetic algorithms and instruction scheduling, Proc. of the 24th Ann. Int’l. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 206–211 (1991).

    Chapter  Google Scholar 

  61. A. De Gloria, P. Faraboschi, and M. Olivieri, A non-deterministic scheduler for a software pipelining compiler, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 41–44, (1991).

    Google Scholar 

  62. P. Feautrier, Fine-grain scheduling under resource constraints, Proc. of the 7th Ann. Workshop on Languages and Compilers for Parallel Computing, LNCS, Ithaca, New York, (1994).

    Google Scholar 

  63. E. R. Altman, R. Govindrajan, and G. R. Gao, Scheduling and mapping: software pipelining in the presence of hazards Proc. of the ACM SIGPLAN ’95 Conf. on Programming Language Design and Implementation, La Jolla, California, pp. 139–150 (1995).

    Google Scholar 

  64. A. Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int’l. Conf. on Supercomputing, Barcelona, Spain (1995).

    Google Scholar 

  65. T. C. Hu, Parallel sequencing and assembly line problems, Operations Research 9(6): 841–848 (1961).

    Article  MathSciNet  Google Scholar 

  66. C. V. Ramamoorthy, K. M. Chandy, and M. J. Gonzalez, Optimal scheduling strategies in a multiprocessor system, IEEE Trans. on Computers C-21(2): 137–146 (1972).

    Article  MathSciNet  MATH  Google Scholar 

  67. M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum and J. Martin, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The Int’l. J. of Supercomputer Applic. 3(3):5–40 (1989).

    Article  Google Scholar 

  68. J. Uniejewski, SPEC Benchmark Suite: Designed for Today’s Advanced Systems, SPEC Newsletter Vol. 1 No. 1 (1989).

  69. F. H. McMahon, The Livermore Fortran kernels: a computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.

    Google Scholar 

  70. J. A. Fisher, D. Landskov, and B. D. Shriver, Microcode compaction: looking backward and looking forward, Proc. of the National Computer Conf. pp. 95–102 (1981).

  71. F. Bodin and F. Charot, Loop optimization for horizontal microcoded machines, Proc. of the Int’l Conf. on Supercomputing, Amsterdam, pp. 164–176 (1990)

    Google Scholar 

  72. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1974).

    MATH  Google Scholar 

  73. R. Govindrajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 85–94 (1994).

    Google Scholar 

  74. Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th ACM Symp. on Principles of Programming Languages, Charleston, South Carolina, pp. 29–42 (1993).

    Google Scholar 

  75. B. D. de Dinechin, Fast modulo scheduling under the simplex scheduling frameworks, Technical Report 1995.001, 1995.

  76. J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis, Software pipelining with register allocation and spill, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 95–99 (1994).

    Google Scholar 

  77. A. E. Eichenberger and E. S. Davidson, Stage scheduling: a technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rau, B.R. Iterative Modulo Scheduling. Int J Parallel Prog 24, 3–64 (1996). https://doi.org/10.1007/BF03356742

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03356742

Key Words

Navigation