Advertisement

Journal of Computer Science and Technology

, Volume 23, Issue 1, pp 141–153 | Cite as

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

  • Han-Xin SunEmail author
  • Kun-Peng Yang
  • Yu-Lai Zhao
  • Dong Tong
  • Xu Cheng
Regular Paper

Abstract

The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.

Keywords

computer architecture instruction cache instruction TLB instruction fetch unit power-efficient design dynamic voltage scaling 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2008_Article_9117_ESM.pdf (83 kb)
(PDF 82.7 kb)

References

  1. [1]
    Wilcox K, Manne Srilatha. Alpha processors: A history of power issues and a look to the future. Nov. 15th, 1999, http://www.eecs.umich.edu/∼tnm/cool.html.
  2. [2]
    Manne S, Klauser A, Grunwald D. Pipeline gating: Speculation control for energy reduction. In Proc. 25th Int. Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.132–141.Google Scholar
  3. [3]
    Montanaro J et al. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996, 32(11): 1703–1714.CrossRefGoogle Scholar
  4. [4]
    Kim N S, Flautner K, Blaauw D, Mudge T. Drowsy instruction caches. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.219–230.Google Scholar
  5. [5]
    Chang Y, Ruan S, Lai F. Design and analysis of low-power cache using two-level filter scheme. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2003, 11(4): 568–580.CrossRefGoogle Scholar
  6. [6]
    Kadayif I, Sivasubramaniam A, Kandemir M, Kandiraju G, Chen G. Generating physical addresses directly for saving instruction TLB energy. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.185–196.Google Scholar
  7. [7]
    Bellas N, Hajj I N, Polychronopoulos C D, Stamoulis G. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(3): 317–326.CrossRefGoogle Scholar
  8. [8]
    Su C L, Despain A M. Cache design for energy efficiency. In Proc. 28th Int. System Sciences Conference, Hawaii, USA, 1995, pp.306–315.Google Scholar
  9. [9]
    Ghose K, Kamble M B. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, CA, USA, 1999, pp.70–75.Google Scholar
  10. [10]
    Powell M D, Agarwal A, Vijaykumar T N, Falsafi B, Roy K. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proc. Int. Symposium on Microarchitecture, Austin, Texas, USA, 2001, pp.54–65.Google Scholar
  11. [11]
    Powell M D, Yang S, Falsafi B, Roy K, Vijaykumar T M. Reducing leakage in a high-performance deep submicron instruction cache. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2001, 9(1): 77–89.CrossRefGoogle Scholar
  12. [12]
    Kim N S, Flautner K, Blaauw D, Mudge T. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2004, 12(2): 167–184.CrossRefGoogle Scholar
  13. [13]
    Agarwal A, Li H, Roy K. DRG-cache: A data retention gated-ground cache for low power. In Proc. Design Automation Conference, New Orleans, LA, USA, 2002, pp.473–478.Google Scholar
  14. [14]
    Heo S, Barr K, Hampton M, Asanovic K. Dynamic fine-grain leakage reduction using leakage-biased bitlines. In Proc. Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.137–147.Google Scholar
  15. [15]
    Soontae K, Vijaykrishnan N, Kandemir M, Irwin M J. Predictive precharging for bitline leakage energy reduction. In Proc. IEEE ASIC/SOC Conference, 2002, pp.36–40.Google Scholar
  16. [16]
    Kim N S, Flautner K, Blaauw D, Mudge T. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.54–57.Google Scholar
  17. [17]
    Lee J, Park G, Park S, Kim S. A selective filter-bank TLB system. In Proc. Int. Symposium on Low Power Electronics and Design, Seoul, Korea, 2003, pp.312–317.Google Scholar
  18. [18]
    Fan D, Tang Z, Huang H, Gao G. An energy efficient TLB design methodology. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, California, USA, 2005, pp.351–356.Google Scholar
  19. [19]
    Smith J E, Sohi G S. The microarchitecture of superscalar processors. Proc. the IEEE, 1995, 83(12): 1609–1624.CrossRefGoogle Scholar
  20. [20]
    Horel T, Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19(3): 73–85.CrossRefGoogle Scholar
  21. [21]
    Inoue K, Moshnyaga V G, Murakami K. A low energy set-associative I-Cache with extended BTB. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.187–192.Google Scholar
  22. [22]
    Reinman G, Jouppi N. CACTI 2.0: An integrated cache timing and power model. Compaq, Palo Alto, CA, WRL Res. Rep., July 2000.Google Scholar
  23. [23]
    Seznec A, Felix S, Krishnam V, Sazeides Y. Design tradeoffs for the Alpha EV8 conditional branch predictor. In Proc. 29th Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.295–306.Google Scholar
  24. [24]
    Hossain A, Pease D J, Burns J S, Parveen N. Trace cache performance parameters. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.348–355.Google Scholar
  25. [25]
    Hu J S, Vijaykrishnan N, Irwin M J, Kandemir M. Using dynamic branch behavior for power-efficient instruction fetch. In Proc. the IEEE Computer Society Annual Symposium on VLSI, Tampa, Florida, USA, 2003, pp.127–132.Google Scholar
  26. [26]
    Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M R. Hotleakage: An architectural, temperature-aware model of subthreshold and gate leakage. Tech. Report CS–2003–05, Department of Computer Sciences, University of Virginia, Virginia, USA, Mar. 2003.Google Scholar
  27. [27]
    Burger D C, Austin T M. The SimpleScalar tool set, Version 2.0. Computer Architecture News, New York, USA, 1997, 25(3): 13–25.Google Scholar
  28. [28]
    Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural power analysis and optimizations. In Proc. 27th Int. Symposium on Computer Architecture, British Columbia, Canada, 2000, pp.83–94.Google Scholar
  29. [29]
    Shivakumar P, Jouppi N. CACTI 3.0: An integrated cache timing, power, and area model. Compaq, Palo Alto, CA, WRL Res. Rep., Feb. 2001.Google Scholar
  30. [30]
    Standard Performance Evaluation Corp. http://www. specbench.org.
  31. [31]
    Baniasadi A, Moshovos A. SEPAS: A highly accurate and energy-efficient branch predictor. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.38–43.Google Scholar
  32. [32]
    Deris K J, Baniasadi A. SABA: A zero timing overhead power-aware BTB for high-performance processors. Workshop on Unique Chips and Systems held in conjunction with IEEE International Symposium on Performance Analysis of Systems and Software, Austin, Texas, USA, 2006.Google Scholar

Copyright information

© Science Press, Beijing, China and Springer Science + Business Media, LLC, USA 2008

Authors and Affiliations

  • Han-Xin Sun
    • 1
    Email author
  • Kun-Peng Yang
    • 1
  • Yu-Lai Zhao
    • 1
  • Dong Tong
    • 1
  • Xu Cheng
    • 1
  1. 1.Microprocessor Research and Development CenterPeking UniversityBeijingChina

Personalised recommendations