Advertisement

MIRS: Modulo Scheduling with Integrated Register Spilling

  • Javier Zalamea
  • Josep Llosa
  • Eduard Ayguadé
  • Mateo Valero
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2624)

Abstract

The overlapping of loop iterations in software pipelining techniques imposes high register requirements. The schedule for a loop is valid if it requires at most the number of registers available in the target architecture. Otherwise its register requirements have to be reduced by spilling registers to memory. Previous proposals for spilling in software pipelined loops require a two-step process. The first step performs the actual instruction scheduling without register constraints. The second step adds (if required) spill code and reschedules the modified loop. The process is repeated until a valid schedule, requiring no more registers than those available, is found.

The paper presents MIRS (Modulo scheduling with Integrated Register Spilling), a novel register-constrained modulo scheduler that performs modulo scheduling and register spilling simultaneously in a single step. The algorithm is iterative and uses backtracking to undo previous scheduling decisions whenever resource or dependence conflicts appear. MIRS is compared against a state-of-the-art two-step approach already described in the literature. For this purpose, a workbench composed of a large set of loops from the Perfect Club and a set of processor configurations are used. On the average, for the loops that require spill code a speed-up in the range 14–31% and a reduction of the memory traffic by a factor in the range 0.90–0.72 are achieved.

Keywords

Instruction-Level Parallelism Software Pipelining Register Allocation Spill Code 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    A. Aiken and A. Nicolau. A realistic resource-constrained software pipelining algorithm. Advances in Languages and Compilers for Parallel Processing, pages 274–290, 1991.Google Scholar
  2. [2]
    V. Allan, R. Jones, R. Lee, and S. Allan. Software pipelining. ACM Computing Surveys, 27(3): 367–432, September 1995.CrossRefGoogle Scholar
  3. [3]
    J. Allen, K. Kennedy, and J. Warren. Conversion of control dependence to data dependence. In Proc. 10th annual Symposium on Principles of Programming Languages, January 1983.Google Scholar
  4. [4]
    E. Ayguadé, C. Barrado, A. González, J. Labarta, J. Llosa, D. López, S. Moreno, D. Padua, F. Reig, Q. Riera, and M. Valero. Ictineo: a tool for instruction level parallelism research. Technical Report UPC-DAC-96-61, Universitat Politècnica de Catalunya, December 1996.Google Scholar
  5. [5]
    D. Bernstein, D. Goldin, M. Golumbic, H. Krawczyk, Y. Mansour, I. Nahshon, and R. Pinter. Spill code minimization techniques for optimizing compilers. In Proc. of the ACM SIGPLAN’89 Conf. on Programming Languages Design and Implementation, pages 258–263, July 1989.Google Scholar
  6. [6]
    M. Berry, D. Chen, P. Koss, and D. Kuck. The Perfect Club benchmarks: Effective performance evaluation of supercomputers. Technical Report 827, Center for Supercomputing Research and Development, November 1988.Google Scholar
  7. [7]
    P. Briggs, K. Cooper, K. Kennedy, and L. Torczon. Coloring heuristics for register allocation. In Proc. of the ACM SIGPLAN’89 Conf. on Programming Language Design and Implementation, pages 275–284, June 1989.Google Scholar
  8. [8]
    P. Briggs, K. Cooper, and L. Torczon. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems, 16(3):428–455, May 1994.CrossRefGoogle Scholar
  9. [9]
    D. Callahan and B. Koblenz. Register allocation via hierarchical graph coloring. In Proc. of the ACM SIGPLAN’91 Conf. on Programming Language Design and Implementation, pages 192–203, June 1991.Google Scholar
  10. [10]
    G. Chaitin. Register allocation and spilling via graph coloring. In Proc. ACM SIGPLAN Symp. on Compiler Construction, pages 98–105, June 1982.Google Scholar
  11. [11]
    A. Charlesworth. An approach to scientific array processing: The architectural design of the AP120B/FPS-164 family. Computer, 14(9):18–27, 1981.CrossRefGoogle Scholar
  12. [12]
    A. K. Dani, V. J. Ramanan, and R. Govindarajan. Register-sensitive software pipelining. In Procs. of the Merged 12th International Parallel Processing and 9th International Symposium on Parallel and Distributed Systems, april 1998.Google Scholar
  13. [13]
    J. Dehnert and R. Towle. Compiling for the Cydra 5. The Journal of Supercomputing, 7(1/2): 181–228, May 1993.CrossRefGoogle Scholar
  14. [14]
    A. Eichenberger and E. Davidson. Stage scheduling: A technique to reduce the register requirements of a modulo schedule. In Proc. of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pages 338–349, November 1995.Google Scholar
  15. [15]
    C. Eisenbeis, S. Lelait, and B. Marmol. The meeting graph: a new model for loop cyclic register allocation. In Proc. of the Fifth Workshop on Compilers for Parallel Computers (CPC95), pages 503–516, June 1995.Google Scholar
  16. [16]
    L. Hendren, G. Gao, E. Altman, and C. Mukerji. Register allocation using cyclic interval graphs: A new approach to an old problem. ACAPS Tech. Memo 33, Advanced Computer Architecture and Program Structures Group, McGill University, 1992.Google Scholar
  17. [17]
    R. Huff. Lifetime-sensitive modulo scheduling. In Proc. of the 6th Conference on Programming Language, Design and Implementation, pages 258–267, 1993.Google Scholar
  18. [18]
    S. Jain. Circular scheduling: A new technique to perform software pipelining. In Proc. of the ACM SIGPLAN’91 Conference on Programming Language Design and Implementation, pages 219–228, June 1991.Google Scholar
  19. [19]
    M. Lam. Software pipelining: An effective scheduling technique for VLIW machines. In Proceedings of the SIGPLAN’88 Conference on Programming Language Design and Implementation, pages 318–328, June 1988.Google Scholar
  20. [20]
    J. Llosa, A. González, E. Ayguadé, and M. Valero. Swing modulo scheduling: A lifetime-sensitive approach. In IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques (PACT’96), pages 80–86, October 1996.Google Scholar
  21. [21]
    J. Llosa, M. Valero, and E. Ayguadé. Heuristics for register-constrained software pipelining. In Proc. of the 29th Annual Int. Symp. on Microarchitecture (MICRO-29), pages 250–261, December 1996.Google Scholar
  22. [22]
    J. Llosa, M. Valero, and E. Ayguadé. Quantitative evaluation of register pressure on software pipelined loops. International Journal of Parallel Programming, 26(2):121–142, April 1998.CrossRefGoogle Scholar
  23. [23]
    J. Llosa, M. Valero, E. Ayguadé, and A. González. Hypernode reduction modulo scheduling. In Proc. of the 28th Annual Int. Symp. on Microarchitecture (MICRO-28), pages 350–360, November 1995.Google Scholar
  24. [24]
    W. Mangione-Smith, S. Abraham, and E. Davidson. Register requirements of pipelined processors. In Proc. of the Int. Conference on Supercomputing, pages 260–246, July 1992.Google Scholar
  25. [25]
    S. Ramakrishnan. Software pipelining in PA-RISC compilers. Hewlett-Packard Journal, pages 39–45, July 1992.Google Scholar
  26. [26]
    B. Rau and C. Glaeser. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. In Proc. of the 14th Annual Microprogramming Workshop, pages 183–197, October 1981.Google Scholar
  27. [27]
    B. Rau, M. Lee, P. Tirumalai, and P. Schlansker. Register allocation for software pipelined loops. In Proc. of the ACM SIGPLAN’92 Conference on Programming Language Design and Implementation, pages 283–299, June 1992.Google Scholar
  28. [28]
    B. R. Rau. Iterative modulo scheduling: An algorithm for software pipelining loops. In Proc. of the 27th Annual International Symposium on Microarchitecture, pages 63–74, November 1994.Google Scholar
  29. [29]
    J. Ruttenberg, G. Gao, A. Stoutchinin, and W. Lichtenstein. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proc. of the ACM SIGPLAN’96 Conf. on Programming Languages Design and Implementation, pages 1–11, May 1996.Google Scholar
  30. [30]
    J. Wang, A. Krall, M. A. Ertl, and C. Eisenbeis. Software pipelining with register allocation and spilling. In Proc. of the 27th Annual Int. Symp. on Microarchitecture, pages 95–99, November 1994.Google Scholar
  31. [31]
    J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. Improved spill code generation for software pipelined loops. In Procs. of the Programming Languages Design and Implementation (PLDI’00), pages 134–144., June 2000.Google Scholar
  32. [32]
    J. Zalamea, J. Llosa, E. Ayguadé, and M. Valero. MIRS: Modulo scheduling with integrated register spilling. Technical Report UPC-DAC-2000-68, Universitat Politècnica de Catalunya, November 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Javier Zalamea
    • 1
  • Josep Llosa
    • 1
  • Eduard Ayguadé
    • 1
  • Mateo Valero
    • 1
  1. 1.Departament d’Arquitectura de Computadors (UPC)Universitat Politècnica de CatalunyaSpain

Personalised recommendations