Instruction Balance and Its Relation to Program Energy Consumption

  • Tao Li
  • Chen Ding
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2624)


A computer consists of multiple components such as functional units, cache and main memory. At each moment of execution, a program may have a varied amount of work for each component. Recent development has exploited this imbalance to save energy by slowing the components that have a lower load. Example techniques include dynamic scaling and clock gating used in processors from Transmeta and Intel. Symmetrical to reconfiguring hardware is reorganizing software. We can alter program demand for different components by reordering program instructions. This paper explores the theoretical lowe4r bound of energy consumption assuming that both a program and a machine are fully adjustable. It shows that a program with a balanced load always consumes less energy than the same program with uneven loads under the same execution speed. In addition, the paper examines the relation between energy consumption and program performance. It shows that reducing power is a different problem than that of improving performance. Finally, the paper presents empirical evidence showing that a program may be transformed to have a balanced demand in most parts of its execution.


Execution Time Switching Cost Execution Trace Memory Operation Dynamic Scaling 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    D. Brooks, V. Tiwari, and M. Martonosi. Wattch: A framework for architectural-level analysis and optimization. In Proceedings of the 27th International Symposium on Computer Architecture, Vancouver, BC, 2000.Google Scholar
  2. [2]
    T. Burd and R. Brodersen. Processor design for portable systems. Journal of LSI Signal Processing, 13(2–3):203–222, 1996.CrossRefGoogle Scholar
  3. [3]
    D. Callahan, J. Cocke, and K. Kennedy. Estimating interlock and improving balance for pipelined machines. Journal of Parallel and Distributed Computing, 5(4):334–358, August 1988.CrossRefGoogle Scholar
  4. [4]
    S. Carr and K. Kennedy. Improving the ratio of memory operations to floating-point operations in loops. ACM Transactions on Programming Languages and Systems, 16(6):1768–1810, 1994.CrossRefGoogle Scholar
  5. [5]
    A. M. Despain and C. Su. Cache designs for energy efficiency. In Proceedings of 28th Hawaii International Conference on System Science, 1995.Google Scholar
  6. [6]
    C. Ding and K. Kennedy. Memory bandwidth bottleneck and its amelioration by a compiler. In Proceedings of 2000 International Parallel and Distribute Processing Symposium (IPDPS), Cancun, Mexico, May 2000.Google Scholar
  7. [7]
    C. Ding and K. Kennedy. Improving effective bandwidth through compiler enhancement of global cache reuse. In Proceedings of International Parallel and Distributed Processing Symposium, San Francisco, CA, 2001.
  8. [8]
    F. Douglis, R. Caceres, B. Marsh, F. Kaashoek, K. Li, and J. Tauber. Storage alternatives for mobile computers. In Proceedings of the first symposium on operating system design and implementation, Monterey, CA, 1994.Google Scholar
  9. [9]
    S. Gary et al. PowerPC 603, a microprocessor for portable computers. In IEEE Design and Test of Computers, pages 14–23, 1994.Google Scholar
  10. [10]
    C.-H. Hsu, U. Kremer, and M. Hsiao. Compiler-directed dynamic frequency and voltage scaling. In Workshop on Power-Aware Computer Systems, Cambridge, MA, 2000.Google Scholar
  11. [11]
    A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power aware page allocation. In Proceedings of the 9th international conference on architectural support for programming languages and operating systems, Cambridge, MA, 2000.Google Scholar
  12. [12]
    J. T. Russell and M. F. Jacome. Software power estimation and optimization for high performance, 32-bit embedded processors. In Proceedings of International Conference on Computer Design, Austin, Texas, 1998.Google Scholar
  13. [13]
    G. Semeraro, M. Grigorios, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. Submitted for publication, 2001.Google Scholar
  14. [14]
    A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando Florida, June 1994.Google Scholar
  15. [15]
    C. Su, C. Tsui, and A. M. Despain. Low power architecture design and compilation techniques for high-performance processors. In Proceedings of the IEEE COMPCON, pages 489–498, 1994.Google Scholar
  16. [16]
    V. Tiwari, S. Maik, and A. Wolfe. Power analysis of embedded software: a first step towards software power minimization. IEEE Transaction on VLSI Systems, 1994.Google Scholar
  17. [17]
    V. Tiwari, S. Maik, A. Wolfe, and M. Lee. Instruction level power analysis and optimization of software. Journal of VLSI Signal Processing, 13(2):1–18, 1996.Google Scholar
  18. [18]
    N. Vijaykrishnan, M. Kandemir, M. J. Irwin, H. S. Kim, and W. Ye. Energy-driven integrated hardware-software optimizations using SimplePower. In Proceedings of the 27th International Symposium on Computer Architecture, Vancouver, BC, 2000.Google Scholar
  19. [19]
    H. Yang, G. R. Gao, and G. Cai. Maximizing pipelined functional unit usage for minimum power software pipelining. Technical Report CAPSL Technical Memo 41, University of Delaware, Newark, Delaware, September 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Tao Li
    • 1
  • Chen Ding
    • 1
  1. 1.Computer Science DepartmentUniversity of RochesterRochester

Personalised recommendations