Global Variable Promotion: Using Registers to Reduce Cache Power Dissipation

  • Andrea G. M. Cilio
  • Henk Corporaal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2304)


Global variable promotion, i.e. allocating unaliased globals to registers, can significantly reduce the number of memory operations. This results in reduced cache activity and less power consumption. The purpose of this paper is to evaluate global variable promotion in the context of ILP scheduling and estimate its potential as a software technique for reducing cache power consumption. We measured the frequency and distribution of accesses to global variables and found that few registers are sufficient to replace the most frequently referenced variables and capture most of the benefits. In our tests, up to 22% of memory operations are removed. Four registers, for example, are sufficient to reduce the energy-delay product by 7 to 26%. Our results suggest that global variable promotion should be included as a standard optimization technique in power-conscious compilers.


Cache Size Data Cache Cache Line Register Allocation Memory Operation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    David Brooks, Vivek Tiwari, and Margaret Martonosi. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th Annual International Symposium on Computer Architecture, pages 83–94, Vancouver, British Columbia, June 12–14, 2000.Google Scholar
  2. 2.
    Fred C. Chow. Minimizing register usage penalty at procedure calls. In SIGPLAN’ 88 Conference on Programming Language Design and Implementation, pages 85–94, 1988.Google Scholar
  3. 3.
    Andrea G. M. Cilio and Henk Corporaal. A linker for effective whole-program optimizations. In Proceedings of HPCN, Amsterdam, The Netherlands, April 1999.Google Scholar
  4. 4.
    Henk Corporaal. Microprocessor Architectures; from VLIW to TTA. John Wiley, 1997. ISBN 0-471-97157-X.Google Scholar
  5. 5.
    R. Gonzalez and M. Horowitz. Energy dissipation in general purpose microprocessors. IEEE Journal of Solid-State Circuits, 31(9):1258–66, September 1996.CrossRefGoogle Scholar
  6. 6.
    Stanford Compiler Group. The SUIF Library. Stanford University, 1994.Google Scholar
  7. 7.
    Jan Hoogerbrugge. Instruction scheduling for trimedia. Journal of Instruction-Level Parallelism, 1(1–2), 1999.Google Scholar
  8. 8.
    J. Janssen. Compilation Strategies for Transport Triggered Architectures. PhD thesis, Delft University of Technology, 2001.Google Scholar
  9. 9.
    Johan Janssen and Henk Corporaal. Registers on demand: an integrated region scheduler and register allocator. In Conference on Compiler Construction, April 1998.Google Scholar
  10. 10.
    M. B. Kamble and K. Ghose. Analytical energy dissipation models for low-power caches. In Proceedings of the 1996 international symposium on Low power electronics and design, Monterey, CA USA, August 12–14, 1997. ACM.Google Scholar
  11. 11.
    M. B. Kamble and K. Ghose. Energy-efficiency of vlsi caches: a comparative study. In Proceedings Tenth International Conference on VLSI Design, pages 261–7. IEEE, January 1997.Google Scholar
  12. 12.
    Johnson Kin, Munish Gupta, and William H. Mangione-Smith. Filtering memory references to increase energy efficiency. IEEE Transactions on Computers, 49(1), January 2000.Google Scholar
  13. 13.
    Hsien-Hsien S. Lee and Gary S. Tyson. Region-based caching: An efficient memory architecture for embedded processors. In CASES, San Jose, CA, November 2000.Google Scholar
  14. 14.
    G. Reinman and N. P. Jouppi. An integrated cache timing and power model. Technical report, COMPAQ Western Research Lab, Palo Alto, California, 1999.Google Scholar
  15. 15.
    Vatsa Santhanam and Daryl Odnert. Register allocation across procedure and module boundaries. In Proceedings of the Conference on Programming Language Design and Implementation, pages 28–39, 1990.Google Scholar
  16. 16.
    Michael D. Smith. Extending SUIF for Machine-dependent Optimizations. In Proceedings of the First SUIF Workshop, January 1996.Google Scholar
  17. 17.
    Peter A. Steenkiste and John L. Hennessy. A simple interprocedural register allocation algorithm and its effectiveness for lisp. TOPLAS, 11(1), 1989.Google Scholar
  18. 18.
    David W. Wall. Register windows vs. register allocation. Technical Report 7, Western Research Laboratory, Digital Equipment Corporation, December 1987.Google Scholar
  19. 19.
    S. J. E. Wilton and N. P. Jouppi. An enhanced access and cycle time model. Technical Report 5, Digital Western Research laboratory, Palo Alto, California, July 1994.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Andrea G. M. Cilio
    • 1
  • Henk Corporaal
    • 2
  1. 1.Computer Engineering Dept.Delft University of TechnologyDelftThe Netherlands
  2. 2.DESICS divisionIMECLeuvenBelgium

Personalised recommendations