Advertisement

International Journal of Parallel Programming

, Volume 24, Issue 1, pp 3–64 | Cite as

Iterative Modulo Scheduling

  • B. Ramakrishna Rau
Article

Abstract

Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This paper presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

Key Words

Modulo scheduling instruction scheduling software pipelining loop scheduling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. A. Fisher, Trace scheduling: a technique for global microcode compaction, IEEE Transactions on Computers C-30:478–490 (1981).CrossRefGoogle Scholar
  2. 2.
    P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O’Donnell, and J. C. Ruttenberg, The Multiflow trace scheduling compiler, The Journal of Supercomputing 7(1/2):51–142 (1993).CrossRefGoogle Scholar
  3. 3.
    W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing 7(1/2):229–248 (1993).CrossRefGoogle Scholar
  4. 4.
    A. E. Charlesworth, An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 Family, Computer 14(9):18–27 (1981).CrossRefGoogle Scholar
  5. 5.
    K. Ebcioglu, A compilation technique for software pipelining of loops with conditional jumps, Proc. of the 20th Ann. Workshop on Microprogramming, Colorado Springs, Colorado, pp. 69–79 (1987).CrossRefGoogle Scholar
  6. 6.
    K. Ebcioglu and T. Nakatani, A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture, In Languages and Compilers for Parallel Computing, D. Gelernter, A. Nicolau, and D. Padua, eds., Pitman/The MIT Press, pp. 213–229 (1989).Google Scholar
  7. 7.
    S. Jain, Circular scheduling: a new technique to perform software pipelining Proc. of the ACM SIGPLAN ’91 Conf. on Programming Language Design and Implementation, pp. 219–228 (1991).Google Scholar
  8. 8.
    F. Gasperoni and U. Schwiegeishohn, Scheduling loops on parallel processors: a simple algorithm with close to optimum performance, Proc. of the Int’l. Conf CONPAR ’92, pp. 625–636, (1992).Google Scholar
  9. 9.
    S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, (1992).Google Scholar
  10. 10.
    M. Tokoro, T. Takizuka, E. Tamura, and I. Yamaura, A technique of global optimization of microprograms, Proc. of the 11th Ann. Workshop on Microprogramming (Asilomar, California) pp. 41–50 (1978)Google Scholar
  11. 11.
    A. Aiken and A. Nicolau, A realistic resource-constrained software pipelining algorithm, In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, D. Gelernter, T. Gross and D. Padua, eds., Pitman/The MIT Press, pp. 274–290 (1991).Google Scholar
  12. 12.
    M. Rajagopalan and V. H. Allan, Efficient scheduling of fine-grain parallelism in loops, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 2–11 (1993).CrossRefGoogle Scholar
  13. 13.
    V. H. Allan, U. R. Shah, and K. M. Reddy, Petri Net versus modulo scheduling for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).Google Scholar
  14. 14.
    T. Muller, Employing finite automata for resource scheduling, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 12–20 (1993).CrossRefGoogle Scholar
  15. 15.
    T. A. Proebsting and C. W. Fraser, Detecting pipeline hazards quickly, Proc. of the 21st Ann. ACM Symp. on Principles of Programming Languages, Portland, Oregon, pp. 280–286 (1994).Google Scholar
  16. 16.
    V. Bala and N. Rubin, Efficient instruction scheduling using finite state automata, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).Google Scholar
  17. 17.
    B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the 14th Ann. Workshop on Microprogramming pp. 183–198 (1981).Google Scholar
  18. 18.
    B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled loops, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 158–169. (1992).CrossRefGoogle Scholar
  19. 19.
    D. M. Lavery and W. W. Hwu, Unrolling-based optimizations for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).Google Scholar
  20. 20.
    S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, pp. 45–54 (1992).Google Scholar
  21. 21.
    S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 217–227 (1994).Google Scholar
  22. 22.
    B. R. Rau, Data flow and dependence analysis for instruction level parallelism, In Fourth Int’l. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992).CrossRefGoogle Scholar
  23. 23.
    J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Super Computing 7(1/2): 181–228 (1993).CrossRefGoogle Scholar
  24. 24.
    D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for Subscripted Variables, Proc. of the ACM SIGPLAN ’90 Conf. on Programming Language Design and Implementation, pp. 53–65 (1990).Google Scholar
  25. 25.
    E. Duesterwald, R. Gupta, and M. L. Soffa, A practical dataflow framework for array reference analysis and its use in optimizations, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 68–77 (1993).Google Scholar
  26. 26.
    J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence, Proc. of the Tenth Ann. ACM Symp. on Principles of Programming Languages, pp. 177–189 (1983).Google Scholar
  27. 27.
    J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett Packard Laboratories, 1991.Google Scholar
  28. 28.
    B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989).CrossRefGoogle Scholar
  29. 29.
    V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, 1993.Google Scholar
  30. 30.
    G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: architecture and implementation, The Journal of Supercomputing 7(1/2):143–180 (1993).CrossRefGoogle Scholar
  31. 31.
    P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures Proc. of the Supercomputing ’90, pp. 200–212 (1990).Google Scholar
  32. 32.
    S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11(4):376–408 (1993).CrossRefGoogle Scholar
  33. 33.
    M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, Proc. of the Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon (1993).Google Scholar
  34. 34.
    M. S. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann.. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 32–39 (1994).Google Scholar
  35. 35.
    A. Eichenberger, E. S. Davidson, and S. G. Abraham, Minimum register requirements for a modulo schedule, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture San Jose, California, pp. 75–84 (1994).Google Scholar
  36. 36.
    M. Lam, Software pipelining: an effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–327 (1988).Google Scholar
  37. 37.
    B. R. Rau, M. Lee, P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the SIGPLAN ’92 Conf. on Language Design and Implementation, San Francisco (1992).Google Scholar
  38. 38.
    N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque. New Mexico, pp. 290–299 (1993).Google Scholar
  39. 39.
    S. Ramakrishnan, Software pipelining in PA-RISC compilers, Hewlett-Packard Journal, pp. 39–45 (1992).Google Scholar
  40. 40.
    P. Y. T. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois, Urbana-Champaign, 1986.Google Scholar
  41. 41.
    N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian, Enhanced modulo scheduling for loops with conditional branches, Proc. of the The 25th Ann. Int’l. on Microarchitecture, Portland, Oregon, pp. 170–179. (1992)Google Scholar
  42. 42.
    N. J. Warter, D. M. Lavery, and W. W. Hwu, The benefit of predicated execution for software pipelining, Proc. of the 26th Ann. Hawaii Int’l. Conf. on System Sciences, Hawaii (1993).Google Scholar
  43. 43.
    R. A. Huff, Lifetime-sensitive module scheduling, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 258–267 (1993).Google Scholar
  44. 44.
    J. W. Bockhaus, An Implementation of GURPR*: A Software Pipelining Algorithm, M.S. Thesis. University of Illinois at Urbana-Champaign, 1992.Google Scholar
  45. 45.
    B. Su and J. Wang, GURPR*: A new global software pipelining algorithm. Proc. of the 24th Ann. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 212–216 (1991).CrossRefGoogle Scholar
  46. 46.
    B. R. Rau, Iterative modulo scheduling: an algorithm for software pipelining loops, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 63–74 (1994).Google Scholar
  47. 47.
    J. Ferrante, K. J. Ottenstein and J. D. Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems 9(3): 319–349 (1987).CrossRefMATHGoogle Scholar
  48. 48.
    H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachussetts, (1990).Google Scholar
  49. 49.
    E. S. Davidson, L. E. Shar, A. T. Thomas, and J. H. Patel, Effective control for pipelined computers, Proc. of the COMPCON ’90, San Francisco, pp. 181–184 (1990).Google Scholar
  50. 50.
    A. E. Eichenberger and E. S. Davidson, A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints, Technical Report CSE-TR-266-95, University of Michigan, (1995).Google Scholar
  51. 51.
    T. L. Adam, K. M. Chandy and J. R. Dickson, A comparison of list schedules for parallel processing Systems, Comm. ACM 17(12):685–690 (1974).CrossRefMATHGoogle Scholar
  52. 52.
    W. H. Kohler, A preliminary evaluation of the critical path method for scheduling tasks on multiprocessor Systems, IEEE Trans. on Computers C-24(12): 1235–1238 (1975).MathSciNetCrossRefMATHGoogle Scholar
  53. 53.
    J. C. Tiernan, An efficient search algorithm to find the elementary circuits of a graph, Comm. ACM 13:722–726 (1970).MathSciNetCrossRefMATHGoogle Scholar
  54. 54.
    P. Mateti and N. Deo, On algorithms for enumerating all circuits of a graph, SIAM Journal of Computing 5(1):90–99 (1976).MathSciNetCrossRefMATHGoogle Scholar
  55. 55.
    B. D. de Dinechin, An introduction to simplex scheduling, Proc. of the IFIP WG10.3 Working Conf. on Parallel Architectures and Compilation Techniques, PACT ’94, Montreal, Canada, pp. 327–330 (1994).Google Scholar
  56. 56.
    B. D. de Dinechin, Simplex scheduling: more than lifetime-sensitive instruction scheduling, Technical Report 1994.22, PRISM, 1994.Google Scholar
  57. 57.
    E. L. Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and Winston, 1976.MATHGoogle Scholar
  58. 58.
    C. R. Reeves, Modern Heuristic Techniques for Comhinatorial Problems, Halsted Press: an imprint of John Wiley & Sons, Inc., New York, (1993).MATHGoogle Scholar
  59. 59.
    D. Jacobs, P. Siegel, and K. Wilson, Monte Carlo techniques in code optimization. Proc. of the 15th Ann. Int’l. Symp. on Microprogramming (1982).Google Scholar
  60. 60.
    S. Beaty, Genetic algorithms and instruction scheduling, Proc. of the 24th Ann. Int’l. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 206–211 (1991).CrossRefGoogle Scholar
  61. 61.
    A. De Gloria, P. Faraboschi, and M. Olivieri, A non-deterministic scheduler for a software pipelining compiler, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 41–44, (1991).Google Scholar
  62. 62.
    P. Feautrier, Fine-grain scheduling under resource constraints, Proc. of the 7th Ann. Workshop on Languages and Compilers for Parallel Computing, LNCS, Ithaca, New York, (1994).Google Scholar
  63. 63.
    E. R. Altman, R. Govindrajan, and G. R. Gao, Scheduling and mapping: software pipelining in the presence of hazards Proc. of the ACM SIGPLAN ’95 Conf. on Programming Language Design and Implementation, La Jolla, California, pp. 139–150 (1995).Google Scholar
  64. 64.
    A. Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int’l. Conf. on Supercomputing, Barcelona, Spain (1995).Google Scholar
  65. 65.
    T. C. Hu, Parallel sequencing and assembly line problems, Operations Research 9(6): 841–848 (1961).MathSciNetCrossRefGoogle Scholar
  66. 66.
    C. V. Ramamoorthy, K. M. Chandy, and M. J. Gonzalez, Optimal scheduling strategies in a multiprocessor system, IEEE Trans. on Computers C-21(2): 137–146 (1972).MathSciNetCrossRefMATHGoogle Scholar
  67. 67.
    M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum and J. Martin, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The Int’l. J. of Supercomputer Applic. 3(3):5–40 (1989).CrossRefGoogle Scholar
  68. 68.
    J. Uniejewski, SPEC Benchmark Suite: Designed for Today’s Advanced Systems, SPEC Newsletter Vol. 1 No. 1 (1989).Google Scholar
  69. 69.
    F. H. McMahon, The Livermore Fortran kernels: a computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.Google Scholar
  70. 70.
    J. A. Fisher, D. Landskov, and B. D. Shriver, Microcode compaction: looking backward and looking forward, Proc. of the National Computer Conf. pp. 95–102 (1981).Google Scholar
  71. 71.
    F. Bodin and F. Charot, Loop optimization for horizontal microcoded machines, Proc. of the Int’l Conf. on Supercomputing, Amsterdam, pp. 164–176 (1990)Google Scholar
  72. 72.
    A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1974).MATHGoogle Scholar
  73. 73.
    R. Govindrajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 85–94 (1994).Google Scholar
  74. 74.
    Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th ACM Symp. on Principles of Programming Languages, Charleston, South Carolina, pp. 29–42 (1993).Google Scholar
  75. 75.
    B. D. de Dinechin, Fast modulo scheduling under the simplex scheduling frameworks, Technical Report 1995.001, 1995.Google Scholar
  76. 76.
    J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis, Software pipelining with register allocation and spill, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 95–99 (1994).Google Scholar
  77. 77.
    A. E. Eichenberger and E. S. Davidson, Stage scheduling: a technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).Google Scholar

Copyright information

© Plenum Publishing Corporation 1996

Authors and Affiliations

  • B. Ramakrishna Rau
    • 1
  1. 1.Hewlett-Packard LaboratoriesPalo AltoUSA

Personalised recommendations