“Look It Up” or “Do the Math”: An Energy, Area, and Timing Analysis of Instruction Reuse and Memoization

  • Daniel Citron
  • Dror G. Feitelson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3164)


Instruction reuse and memoization exploit the fact that during a program run there are operations that execute more than once with the same operand values. By saving previous occurrences of instructions (operands and result) in dedicated, on-chip lookup tables, it is possible to avoid re-execution of these instructions. This has been shown to be efficient in a naive model that assumes single-cycle table lookup. We now extend the analysis to consider the energy, area, and timing overheads of maintaining such tables.

We show that reuse opportunities abound in the SPEC CPU2000 benchmark suite, and that by judiciously selecting table configurations it is possible to exploit these opportunities with a minimal penalty. Energy consumption can be further reduced by employing confidence counters, which enable instructions that have a history of failed memoizations to be filtered out. We conclude by identifying those instructions that profit most from memoization, and the conditions under which it is beneficial.


Memoization Reuse Energy Area Lookup 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    O’Connell, F., White, S.: POWER3: the next generation of PowerPC processors. IBM Journal of Research and Development 44, 873–884 (2000)CrossRefGoogle Scholar
  2. 2.
    Vetter, S., et al.: The POWER4 Processor Introduction and Tuning Guide. IBM (2001)Google Scholar
  3. 3.
    Intel Corporation: Differences in Optimizing for the Pentium 4 Processor vs. the Pentium III ProcessorGoogle Scholar
  4. 4.
    Intel Corporation: IA-32 Intel® Architecture Optimization Reference Manual (2003)Google Scholar
  5. 5.
  6. 6.
    Sun Microsystems: UltraSPARC III User Manual 2.2 edn. (2003)Google Scholar
  7. 7.
    Sodani, A., Sohi, G.: Dynamic Instruction Reuse. In: Proceedings of the 24th International Symposium on Computer Architecture (1997)Google Scholar
  8. 8.
    Citron, D., Feitelson, D., Rudolph, L.: Accelerating Multi-Media Processing by Implementing Memoing in Multiplication and Division Units. In: Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operationg Systems, pp. 252–261 (1998)Google Scholar
  9. 9.
    Richardson, S.: Exploiting Trivial and Redundant Computation. In: Proceedings of the 11th Symposium on Computer Arithmetic, pp. 220–227 (1993)Google Scholar
  10. 10.
    Molina, C., González, A., Tubella, J.: Dynamic Removal of Redundant Computations. In: Proceedings of the 1999 International Conference on Supercomputing, pp. 474–481 (1999)Google Scholar
  11. 11.
    Azam, M., Franzon, P., Liu, W.: Low Power Data Processing by Elimination of Redundant Computations. In: Proceedings of the 7th International Symposium on Low Power Electronics and Design, pp. 259–264 (1997)Google Scholar
  12. 12.
    Citron, D., Feitelson, D.: Revisiting Instruction Level Reuse. In: Proceedings of the 1st Workshop on Duplicating, Deconstructing, and Debunking, pp. 62–70 (2002)Google Scholar
  13. 13.
    Tendler, J.M., Dodson, J.S., Fields, J., Le, H., Sinharoy, B.: POWER4 system microarchitecture. IBM Journal of Research and Development 46, 5–26 (2002)CrossRefGoogle Scholar
  14. 14.
    Moudgill, M., Wellman, J., Moreno, J.: Environment for PowerPC Microarchitecture Exploration. IEEE Micro 19, 15–25 (1999)CrossRefGoogle Scholar
  15. 15.
    KleinOsowski, A., Lilja, D.J.: MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research. Computer Architecture Letters 1 (2002)Google Scholar
  16. 16.
    Yi, J., Lilja, D.: An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks. In: Proceedings of the 4th Annual Workshop on Workload Characterization (2001)Google Scholar
  17. 17.
    Jain, R.: The Art of Computer Systems Performance Analysis. Wiley Professional Computing (1992)Google Scholar
  18. 18.
    Shivakumar, P., Jouppi, N.: CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. Technical report, Compaq: Western Research Laboratory (2001)Google Scholar
  19. 19.
    Yi, J., Lilja, D.: Improving Processor Performance by Simplifying and Bypassing Trivial Computations. In: Proceedings of the 20th International Conference on Computer Design (2002)Google Scholar
  20. 20.
    Jacobsen, E., Rotenberg, E., Smith, J.: Assigning Confidence to Conditional Branch Predictions. In: Proceedings of the 29th International Symposium on Microarchitecture, pp. 142–152 (1996)Google Scholar
  21. 21.
    Burtscher, M., Zorn, B.G.: Prediction Outcome History-based Confidence Estimation for Load Value Prediction. Journal of Instruction-Level Parallelism 1 (1999)Google Scholar
  22. 22.
    Brooks, D., Bose, P., Srinivasan, V., Gschwind, M.K., Emma, P.G., Rosenfield, M.G.: New methodology for early-stage microarchitecture-level power-performance analysis of microprocessors. IBM Journal of Research and Development 47, 653–670 (2003)CrossRefGoogle Scholar
  23. 23.
    Connors, D., mei Hwu, W.: Compiler-Directed Dynamic Computation Reuse: Rationale and Initial Results. In: Proceedings of the 32nd International Symposium on Microarchitecture, pp. 158–169 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Daniel Citron
    • 1
  • Dror G. Feitelson
    • 2
  1. 1.IBM Haifa Labs, Haifa University CampusHaifaIsrael
  2. 2.School of Computer Science and EngineeringThe Hebrew University of JerusalemJerusalemIsrael

Personalised recommendations