Iterative Modulo Scheduling

Rau, B. Ramakrishna

doi:10.1007/BF03356742

B. Ramakrishna Rau¹

160 Accesses
62 Citations
3 Altmetric
Explore all metrics

Abstract

Modulo scheduling is a framework within which algorithms for software pipelining innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This paper presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

J. A. Fisher, Trace scheduling: a technique for global microcode compaction, IEEE Transactions on Computers C-30:478–490 (1981).
Article Google Scholar
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O’Donnell, and J. C. Ruttenberg, The Multiflow trace scheduling compiler, The Journal of Supercomputing 7(1/2):51–142 (1993).
Article Google Scholar
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The superblock: an effective technique for VLIW and superscalar compilation, The Journal of Supercomputing 7(1/2):229–248 (1993).
Article Google Scholar
A. E. Charlesworth, An approach to scientific array processing: the architectural design of the AP-120B/FPS-164 Family, Computer 14(9):18–27 (1981).
Article Google Scholar
K. Ebcioglu, A compilation technique for software pipelining of loops with conditional jumps, Proc. of the 20th Ann. Workshop on Microprogramming, Colorado Springs, Colorado, pp. 69–79 (1987).
Chapter Google Scholar
K. Ebcioglu and T. Nakatani, A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture, In Languages and Compilers for Parallel Computing, D. Gelernter, A. Nicolau, and D. Padua, eds., Pitman/The MIT Press, pp. 213–229 (1989).
Google Scholar
S. Jain, Circular scheduling: a new technique to perform software pipelining Proc. of the ACM SIGPLAN ’91 Conf. on Programming Language Design and Implementation, pp. 219–228 (1991).
F. Gasperoni and U. Schwiegeishohn, Scheduling loops on parallel processors: a simple algorithm with close to optimum performance, Proc. of the Int’l. Conf CONPAR ’92, pp. 625–636, (1992).
S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, (1992).
Google Scholar
M. Tokoro, T. Takizuka, E. Tamura, and I. Yamaura, A technique of global optimization of microprograms, Proc. of the 11th Ann. Workshop on Microprogramming (Asilomar, California) pp. 41–50 (1978)
Google Scholar
A. Aiken and A. Nicolau, A realistic resource-constrained software pipelining algorithm, In Advances in Languages and Compilers for Parallel Processing, A. Nicolau, D. Gelernter, T. Gross and D. Padua, eds., Pitman/The MIT Press, pp. 274–290 (1991).
Google Scholar
M. Rajagopalan and V. H. Allan, Efficient scheduling of fine-grain parallelism in loops, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 2–11 (1993).
Chapter Google Scholar
V. H. Allan, U. R. Shah, and K. M. Reddy, Petri Net versus modulo scheduling for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).
Google Scholar
T. Muller, Employing finite automata for resource scheduling, Proc. of the 26th Int’l. Symp. on Microarchitecture, Austin, Texas, pp. 12–20 (1993).
Chapter Google Scholar
T. A. Proebsting and C. W. Fraser, Detecting pipeline hazards quickly, Proc. of the 21st Ann. ACM Symp. on Principles of Programming Languages, Portland, Oregon, pp. 280–286 (1994).
Google Scholar
V. Bala and N. Rubin, Efficient instruction scheduling using finite state automata, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).
Google Scholar
B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the 14th Ann. Workshop on Microprogramming pp. 183–198 (1981).
B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled loops, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 158–169. (1992).
Chapter Google Scholar
D. M. Lavery and W. W. Hwu, Unrolling-based optimizations for software pipelining, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).
Google Scholar
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, pp. 45–54 (1992).
S. A. Mahlke, R. E. Hank, R. A. Bringmann, J. C. Gyllenhaal, D. M. Gallagher, and W. W. Hwu, Characterizing the impact of predicated execution on branch prediction, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 217–227 (1994).
Google Scholar
B. R. Rau, Data flow and dependence analysis for instruction level parallelism, In Fourth Int’l. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992).
Chapter Google Scholar
J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Super Computing 7(1/2): 181–228 (1993).
Article Google Scholar
D. Callahan, S. Carr, and K. Kennedy. Improving Register Allocation for Subscripted Variables, Proc. of the ACM SIGPLAN ’90 Conf. on Programming Language Design and Implementation, pp. 53–65 (1990).
E. Duesterwald, R. Gupta, and M. L. Soffa, A practical dataflow framework for array reference analysis and its use in optimizations, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 68–77 (1993).
Google Scholar
J. R. Allen, K. Kennedy, C. Porterfield, and J. Warren. Conversion of control dependence to data dependence, Proc. of the Tenth Ann. ACM Symp. on Principles of Programming Languages, pp. 177–189 (1983).
J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett Packard Laboratories, 1991.
Google Scholar
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989).
Article Google Scholar
V. Kathail, M. Schlansker, and B. R. Rau. HPL PlayDoh Architecture Specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, 1993.
Google Scholar
G. R. Beck, D. W. L. Yen, and T. L. Anderson, The Cydra 5 mini-supercomputer: architecture and implementation, The Journal of Supercomputing 7(1/2):143–180 (1993).
Article Google Scholar
P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures Proc. of the Supercomputing ’90, pp. 200–212 (1990).
S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: a model for compiler-controlled speculative execution. ACM Transactions on Computer Systems 11(4):376–408 (1993).
Article Google Scholar
M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, Proc. of the Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon (1993).
Google Scholar
M. S. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann.. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 32–39 (1994).
Google Scholar
A. Eichenberger, E. S. Davidson, and S. G. Abraham, Minimum register requirements for a modulo schedule, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture San Jose, California, pp. 75–84 (1994).
Google Scholar
M. Lam, Software pipelining: an effective scheduling technique for VLIW machines, Proc. of the ACM SIGPLAN ’88 Conf. on Programming Language Design and Implementation, pp. 318–327 (1988).
B. R. Rau, M. Lee, P. Tirumalai, and M. S. Schlansker, Register allocation for software pipelined loops, Proc. of the SIGPLAN ’92 Conf. on Language Design and Implementation, San Francisco (1992).
Google Scholar
N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque. New Mexico, pp. 290–299 (1993).
Google Scholar
S. Ramakrishnan, Software pipelining in PA-RISC compilers, Hewlett-Packard Journal, pp. 39–45 (1992).
P. Y. T. Hsu, Highly Concurrent Scalar Processing. Ph.D. Thesis, University of Illinois, Urbana-Champaign, 1986.
Google Scholar
N. J. Warter, J. W. Bockhaus, G. E. Haab, and K. Subramanian, Enhanced modulo scheduling for loops with conditional branches, Proc. of the The 25th Ann. Int’l. on Microarchitecture, Portland, Oregon, pp. 170–179. (1992)
Google Scholar
N. J. Warter, D. M. Lavery, and W. W. Hwu, The benefit of predicated execution for software pipelining, Proc. of the 26th Ann. Hawaii Int’l. Conf. on System Sciences, Hawaii (1993).
Google Scholar
R. A. Huff, Lifetime-sensitive module scheduling, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 258–267 (1993).
Google Scholar
J. W. Bockhaus, An Implementation of GURPR*: A Software Pipelining Algorithm, M.S. Thesis. University of Illinois at Urbana-Champaign, 1992.
Google Scholar
B. Su and J. Wang, GURPR*: A new global software pipelining algorithm. Proc. of the 24th Ann. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 212–216 (1991).
Chapter Google Scholar
B. R. Rau, Iterative modulo scheduling: an algorithm for software pipelining loops, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 63–74 (1994).
Google Scholar
J. Ferrante, K. J. Ottenstein and J. D. Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems 9(3): 319–349 (1987).
Article MATH Google Scholar
H. Zima and B. Chapman, Supercompilers for Parallel and Vector Computers, Addison-Wesley, Reading, Massachussetts, (1990).
Google Scholar
E. S. Davidson, L. E. Shar, A. T. Thomas, and J. H. Patel, Effective control for pipelined computers, Proc. of the COMPCON ’90, San Francisco, pp. 181–184 (1990).
Google Scholar
A. E. Eichenberger and E. S. Davidson, A Reduced Multipipeline Machine Description that Preserves Scheduling Constraints, Technical Report CSE-TR-266-95, University of Michigan, (1995).
Google Scholar
T. L. Adam, K. M. Chandy and J. R. Dickson, A comparison of list schedules for parallel processing Systems, Comm. ACM 17(12):685–690 (1974).
Article MATH Google Scholar
W. H. Kohler, A preliminary evaluation of the critical path method for scheduling tasks on multiprocessor Systems, IEEE Trans. on Computers C-24(12): 1235–1238 (1975).
Article MathSciNet MATH Google Scholar
J. C. Tiernan, An efficient search algorithm to find the elementary circuits of a graph, Comm. ACM 13:722–726 (1970).
Article MathSciNet MATH Google Scholar
P. Mateti and N. Deo, On algorithms for enumerating all circuits of a graph, SIAM Journal of Computing 5(1):90–99 (1976).
Article MathSciNet MATH Google Scholar
B. D. de Dinechin, An introduction to simplex scheduling, Proc. of the IFIP WG10.3 Working Conf. on Parallel Architectures and Compilation Techniques, PACT ’94, Montreal, Canada, pp. 327–330 (1994).
Google Scholar
B. D. de Dinechin, Simplex scheduling: more than lifetime-sensitive instruction scheduling, Technical Report 1994.22, PRISM, 1994.
Google Scholar
E. L. Lawler, Combinatorial Optimization: Networks and Matroids, Holt, Rinehart and Winston, 1976.
MATH Google Scholar
C. R. Reeves, Modern Heuristic Techniques for Comhinatorial Problems, Halsted Press: an imprint of John Wiley & Sons, Inc., New York, (1993).
MATH Google Scholar
D. Jacobs, P. Siegel, and K. Wilson, Monte Carlo techniques in code optimization. Proc. of the 15th Ann. Int’l. Symp. on Microprogramming (1982).
S. Beaty, Genetic algorithms and instruction scheduling, Proc. of the 24th Ann. Int’l. Symp. on Microarchitecture, Albuquerque, New Mexico, pp. 206–211 (1991).
Chapter Google Scholar
A. De Gloria, P. Faraboschi, and M. Olivieri, A non-deterministic scheduler for a software pipelining compiler, Proc. of the 25th Ann. Int’l. Symp. on Microarchitecture, Portland, Oregon, pp. 41–44, (1991).
Google Scholar
P. Feautrier, Fine-grain scheduling under resource constraints, Proc. of the 7th Ann. Workshop on Languages and Compilers for Parallel Computing, LNCS, Ithaca, New York, (1994).
Google Scholar
E. R. Altman, R. Govindrajan, and G. R. Gao, Scheduling and mapping: software pipelining in the presence of hazards Proc. of the ACM SIGPLAN ’95 Conf. on Programming Language Design and Implementation, La Jolla, California, pp. 139–150 (1995).
Google Scholar
A. Eichenberger, E. S. Davidson, and S. G. Abraham, Optimum modulo schedules for minimum register requirements, Proc. of the Int’l. Conf. on Supercomputing, Barcelona, Spain (1995).
Google Scholar
T. C. Hu, Parallel sequencing and assembly line problems, Operations Research 9(6): 841–848 (1961).
Article MathSciNet Google Scholar
C. V. Ramamoorthy, K. M. Chandy, and M. J. Gonzalez, Optimal scheduling strategies in a multiprocessor system, IEEE Trans. on Computers C-21(2): 137–146 (1972).
Article MathSciNet MATH Google Scholar
M. Berry, D. Chen, D. Kuck, S. Lo, Y. Pang, L. Pointer, R. Roloff, A. Samah, E. Clementi, S. Chin, D. Schneider, G. Fox, P. Messina, D. Walker, C. Hsiung, J. Schwarzmeier, L. Lue, S. Orszag, F. Seidl, O. Johnson, R. Goodrum and J. Martin, The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers. The Int’l. J. of Supercomputer Applic. 3(3):5–40 (1989).
Article Google Scholar
J. Uniejewski, SPEC Benchmark Suite: Designed for Today’s Advanced Systems, SPEC Newsletter Vol. 1 No. 1 (1989).
F. H. McMahon, The Livermore Fortran kernels: a computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory, Livermore, California, 1986.
Google Scholar
J. A. Fisher, D. Landskov, and B. D. Shriver, Microcode compaction: looking backward and looking forward, Proc. of the National Computer Conf. pp. 95–102 (1981).
F. Bodin and F. Charot, Loop optimization for horizontal microcoded machines, Proc. of the Int’l Conf. on Supercomputing, Amsterdam, pp. 164–176 (1990)
Google Scholar
A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1974).
MATH Google Scholar
R. Govindrajan, E. R. Altman, and G. R. Gao, Minimizing register requirements under resource-constrained rate-optimal software pipelining, Proc. of the 27th Ann. Int’l. Symp. on Microarchitecture, San Jose, California, pp. 85–94 (1994).
Google Scholar
Q. Ning and G. R. Gao, A novel framework of register allocation for software pipelining, Proc. of the 20th ACM Symp. on Principles of Programming Languages, Charleston, South Carolina, pp. 29–42 (1993).
Google Scholar
B. D. de Dinechin, Fast modulo scheduling under the simplex scheduling frameworks, Technical Report 1995.001, 1995.
J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis, Software pipelining with register allocation and spill, Proc. of the 27th Int’l. Symp. on Microarchitecture, San Jose, California, pp. 95–99 (1994).
Google Scholar
A. E. Eichenberger and E. S. Davidson, Stage scheduling: a technique to reduce the register requirements of a modulo schedule, Proc. of the 28th Ann. Int’l. Symp. on Microarchitecture, Ann Arbor, Michigan (1995).
Google Scholar

Download references

Author information

Authors and Affiliations

Hewlett-Packard Laboratories, 1501 Page Mill Road, Bldg. 3L, Palo Alto, California, 94304, USA
B. Ramakrishna Rau

Authors

B. Ramakrishna Rau
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rau, B.R. Iterative Modulo Scheduling. Int J Parallel Prog 24, 3–64 (1996). https://doi.org/10.1007/BF03356742

Download citation

Published: 26 May 2016
Issue Date: February 1996
DOI: https://doi.org/10.1007/BF03356742

Key Words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Iterative Modulo Scheduling

Abstract

Access this article

Similar content being viewed by others

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

Toward a Standard Interface for User-Defined Scheduling in OpenMP

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key Words

Navigation

Iterative Modulo Scheduling

Abstract

Access this article

Similar content being viewed by others

Adaptive Low-Cost Loop Expansion for Modulo Scheduling

Toward a Standard Interface for User-Defined Scheduling in OpenMP

AlphaZ: A System for Design Space Exploration in the Polyhedral Model

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key Words

Search

Navigation