Advertisement

Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches

  • Rajeev Balasubramonian
  • Viji Srinivasan
  • Sandhya Dwarkadas
  • Alper Buyuktosunoglu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3164)

Abstract

As technology scales and processor speeds improve, power has become a first-order design constraint in all aspects of processor design. In this paper, we explore the use of criticality metrics to reduce dynamic and leakage energy within data caches. We leverage the ability to predict whether an access is in the application’s critical path to partition the accesses into multiple streams. Accesses in the critical path are serviced by a high-performance (hot) cache bank. Accesses not in the critical path are serviced by a lower energy (and lower performance (cold)) cache bank. The resulting organization is a physically banked cache with different levels of energy consumption and performance in each bank. Our results demonstrate that such a classification of instructions and data across two streams can be achieved with high accuracy. Each additional cycle in the cold cache access time slows performance down by only 0.8%. However, such a partition can increase contention for cache banks and entail non-negligible hardware overhead. While prior research has effectively employed criticality metrics to reduce power in arithmetic units, our analysis shows that the success of these techniques are limited when applied to data caches.

Keywords

Critical Path Data Block Data Cache Cache Line Cache Block 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    AS/X User’s Guide, IBM Corporation, New York (1996)Google Scholar
  2. 2.
    Abella, J., Gonzalez, A.: Power Efficient Data Cache Designs. In: Proceedings of ICCD-21 (October 2003)Google Scholar
  3. 3.
    Agarwal, A., Li, H., Roy, K.: DRG-cache: A data retention gated-ground cache for low power. In: Proceedings of the 39th Conference on Design Automation (June 2002)Google Scholar
  4. 4.
    Burger, D., Austin, T.: The Simplescalar Toolset, Version 2.0. Technical Report TR-97-1342, University of Wisconsin-Madison (June 1997)Google Scholar
  5. 5.
    Conn, A.R., Elfadel, I.M., Molzen, W.W., O’Brien, P.R., Strenski, P.N., Visweswariah, C., Whan, C.B.: Gradient-based optimization of custom circuits using a static-timing formulation. In: Proceedings of Design Automation Conference, June 1999, pp. 452–459 (1999)Google Scholar
  6. 6.
    Conn, A.R., Gould, N.I.M., Toint, P.L.: LANCELOT: A Fortran Package for Large-Scale Non-Linear Optimization (Release A). Springer, Heidelberg (1992)CrossRefzbMATHGoogle Scholar
  7. 7.
    Fields, B., Rubin, S., Bodik, R.: Focusing Processor Policies via Critical-Path Prediction. In: Proceedings of ISCA-28 (July 2001)Google Scholar
  8. 8.
    Fisk, B., Bahar, I.: The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency. In: IEEE International Conference on Computer Design (October 1999)Google Scholar
  9. 9.
    Flautner, K., Kim, N.S., Martin, S., Blaauw, D., Mudge, T.: Drowsy Caches: Simple Techniques for Reducing Leakage Power. In: 29th Annual International Symposium on Computer Architecture (May 2002)Google Scholar
  10. 10.
    Gowan, M., Biro, L., Jackson, D.: Power Considerations in the Design of the alpha 21264 Microprocessor. In: 35th Design Authomation Conference, June 1998, pp. 726–731 (1998)Google Scholar
  11. 11.
    Hanson, H., Hrishikesh, M.S., Agarwal, V., Keckler, S.W., Burger, D.: Static Energy Reduction Techniques for Microprocessor Caches. In: 2001 International Conference on Computer Design (September 2001)Google Scholar
  12. 12.
    Heo, S., Barr, K., Hampton, M., Asanovic, K.: Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines. In: 29th Annual International Symposium of Computer Architecture (May 2002)Google Scholar
  13. 13.
    Jouppi, N.P.: Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers. In: Proceedings of the 17th International Symposium on Computer Architecture (ISCA-17), May 1990, pp. 364–373 (1990)Google Scholar
  14. 14.
    Rivers, J., Tyson, G.S., Davidson, E., Austin, T.: On High-Bandwidth Data Cache Design for Multi-Issue Processors. In: Proceedings of the 30th International Symposium on Microarchitecture (December 1997)Google Scholar
  15. 15.
    Kaxiras, S., Hu, Z., Martonosi, M.: Cache Decay: Exploiting Generational Behavior to Reduce Cache Leakage Power. In: 28th Annual International Symposium on Computer Architecture (June 2001)Google Scholar
  16. 16.
    Kurpanek, G., Chan, K., Zheng, J., DeLano, E., Bryg, W.: Pa7200: A pa-risc processor with integrated high performance mp bus interface. COMPCON Digest of Papers, 375–382 (1994)Google Scholar
  17. 17.
    Lee, H., Smelyanskiy, M., Newburn, C., Tyson, G.S.: Stack Value File: Custom Microarchitecture for the Stack. In: Proceedings of the 7th International Symposium on High Performance Computer Architecture, January 2001, pp. 5–14 (2001)Google Scholar
  18. 18.
    Nii, K., Makino, H., Tujihashi, Y., Morishima, C., Hayakawa, Y., Nunogami, H., Arakawa, T., Hamano, H.: A low power sram using auto-backgate-controlled MT-CMOS. In: International Symposium on Low-Power Electronics and Design (1998)Google Scholar
  19. 19.
    Powell, M., Agrawal, A., Vijaykumar, T.N., Falsafi, B., Roy, K.: Reducing set-associative cache energy via selective direct-mapping and way prediction. In: 34th Annual International Symposium on Microarchitecture (December 2001)Google Scholar
  20. 20.
    Rivers, J.A., Davidson, E.S.: Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, pp. 151–162. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  21. 21.
    Seng, J., Tune, E., Tullsen, D., Cai, G.: Reducing Processor Power with Critical Path Prediction. In: Proceedings of MICRO-34 (December 2001)Google Scholar
  22. 22.
    Shivakumar, P., Jouppi, N.P.: CACTI 3.0: An Integrated Cache Timing, Power, and Area Model. Technical Report TN-2001/2, Compaq Western Research Laboratory (August 2001)Google Scholar
  23. 23.
    So, K., Rechtschaffen, R.: Cache operations by mru change. IBM Technical Report, RC-11613 (1985)Google Scholar
  24. 24.
    Srinivasan, S.T., Ju, R.D., Lebeck, A.R., Wilkerson, C.: Locality vs. criticality. In: Proceedings of the 28th Annual International Symposium on Computer Architecture (June 2001)Google Scholar
  25. 25.
    Srinivasan, S.T., Lebeck, A.R.: Load Latency Tolerance in Dynamically Scheduled Processors. Journal of Instruction-Level Parallelism 1 (October 1999)Google Scholar
  26. 26.
    Tune, E., Liang, D., Tullsen, D., Calder, B.: Dynamic Prediction of Critical Path Instructions. In: Proceedings of HPCA-7 (January 2001)Google Scholar
  27. 27.
    Visweswariah, C., Conn, A.R.: Formulation of static circuit optimization with reduced size, degeneracy and redundancy by timing graph manipulation. In: IEEE International Conference on Computer-Aided Design, November 1999, pp. 244–251 (1999)Google Scholar
  28. 28.
    Yang, S.-H., Powell, M.D., Falsafi, B., Roy, K., Vijaykumar, T.N.: An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High Performance I-Caches. In: Seventh International Symposium on High-Performance Computer Architecture (January 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Rajeev Balasubramonian
    • 1
  • Viji Srinivasan
    • 2
  • Sandhya Dwarkadas
    • 3
  • Alper Buyuktosunoglu
    • 2
  1. 1.School of ComputingUniversity of UtahUSA
  2. 2.IBM T.J. Watson Research CenterUSA
  3. 3.Department of Computer ScienceUniversity of RochesterUSA

Personalised recommendations