Optimal and Heuristic Global Code Motion for Minimal Spilling
The interaction of register allocation and instruction scheduling is a well-studied problem: Certain ways of arranging instructions within basic blocks reduce overlaps of live ranges, leading to the insertion of less costly spill code. However, there is little previous research on the extension of this problem to global code motion, i.e., the motion of instructions between blocks. We present an algorithm that models global code motion as an optimization problem with the goal of minimizing overlaps between live ranges in order to minimize spill code.
Our approach analyzes the program to identify the live range overlaps for all possible placements of instructions in basic blocks and all orderings of instructions within blocks. Using this information, we formulate an optimization problem to determine code motions and partial local schedules that minimize the overall cost of live range overlaps. We evaluate solutions of this optimization problem using integer linear programming, where feasible, and a simple greedy heuristic.
We conclude that global code motion with the sole goal of avoiding spills rarely leads to performance improvements because code is placed too conservatively. On the other hand, purely local optimal instruction scheduling for minimal spilling is effective at improving performance when compared to a heuristic scheduler for minimal register use.
KeywordsBasic Block Dependence Graph Register Allocation Instruction Schedule Code Motion
- [Bar11]Barany, G.: Register reuse scheduling. In: 9th Workshop on Optimizations for DSP and Embedded Systems (ODES-9), Chamonix, France, http://www.imec.be/odes/ (April 2011)
- [CCK97]Chang, C.-M., Chen, C.-M., King, C.-T.: Using integer linear programming for instruction scheduling and register allocation in multi-issue processors. In: Computers and Mathematics with Applications (1997)Google Scholar
- [Cli95]Click, C.: Global code motion/global value numbering. In: Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, PLDI 1995, pp. 246–257 (1995)Google Scholar
- [CSG01]Codina, J.M., Sánchez, J., González, A.: A unified modulo scheduling and register allocation technique for clustered processors. In: Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, PACT 2001, pp. 175–184. IEEE Computer Society, Washington, DC (2001)Google Scholar
- [EK12]Eriksson, M., Kessler, C.: Integrated code generation for loops. ACM Trans. Embed. Comput. Syst. 11S(1), 19:1–19:24 (2012)Google Scholar
- [GH88]Goodman, J.R., Hsu, W.-C.: Code scheduling and register allocation in large basic blocks. In: ICS 1988: Proceedings of the 2nd International Conference on Supercomputing, pp. 442–452. ACM, New York (1988)Google Scholar
- [NP93]Norris, C., Pollock, L.L.: A scheduler-sensitive global register allocator. In: Supercomputing 1993: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, pp. 804–813 (1993)Google Scholar
- [NP95a]Norris, C., Pollock, L.L.: An experimental study of several cooperative register allocation and instruction scheduling strategies. In: Proceedings of the 28th Annual International Symposium on Microarchitecture, MICRO 28, pp. 169–179. IEEE Computer Society Press, Los Alamitos (1995)CrossRefGoogle Scholar
- [NP95b]Norris, C., Pollock, L.L.: Register allocation sensitive region scheduling. In: Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques, PaCT 1995, pp. 1–10. IFIP Working Group on Algol, Manchester (1995)Google Scholar
- [ZJC03]Zhou, H., Jennings, M.D., Conte, T.M.: Tree Traversal Scheduling: A Global Instruction Scheduling Technique for VLIW/EPIC Processors. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 223–238. Springer, Heidelberg (2003)Google Scholar