CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs
- 30 Downloads
The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.
Keywordscomputer architecture instruction cache instruction TLB instruction fetch unit power-efficient design dynamic voltage scaling
Unable to display preview. Download preview PDF.
- Wilcox K, Manne Srilatha. Alpha processors: A history of power issues and a look to the future. Nov. 15th, 1999, http://www.eecs.umich.edu/∼tnm/cool.html.
- Manne S, Klauser A, Grunwald D. Pipeline gating: Speculation control for energy reduction. In Proc. 25th Int. Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.132–141.Google Scholar
- Kim N S, Flautner K, Blaauw D, Mudge T. Drowsy instruction caches. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.219–230.Google Scholar
- Kadayif I, Sivasubramaniam A, Kandemir M, Kandiraju G, Chen G. Generating physical addresses directly for saving instruction TLB energy. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.185–196.Google Scholar
- Su C L, Despain A M. Cache design for energy efficiency. In Proc. 28th Int. System Sciences Conference, Hawaii, USA, 1995, pp.306–315.Google Scholar
- Ghose K, Kamble M B. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, CA, USA, 1999, pp.70–75.Google Scholar
- Powell M D, Agarwal A, Vijaykumar T N, Falsafi B, Roy K. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proc. Int. Symposium on Microarchitecture, Austin, Texas, USA, 2001, pp.54–65.Google Scholar
- Agarwal A, Li H, Roy K. DRG-cache: A data retention gated-ground cache for low power. In Proc. Design Automation Conference, New Orleans, LA, USA, 2002, pp.473–478.Google Scholar
- Heo S, Barr K, Hampton M, Asanovic K. Dynamic fine-grain leakage reduction using leakage-biased bitlines. In Proc. Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.137–147.Google Scholar
- Soontae K, Vijaykrishnan N, Kandemir M, Irwin M J. Predictive precharging for bitline leakage energy reduction. In Proc. IEEE ASIC/SOC Conference, 2002, pp.36–40.Google Scholar
- Kim N S, Flautner K, Blaauw D, Mudge T. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.54–57.Google Scholar
- Lee J, Park G, Park S, Kim S. A selective filter-bank TLB system. In Proc. Int. Symposium on Low Power Electronics and Design, Seoul, Korea, 2003, pp.312–317.Google Scholar
- Fan D, Tang Z, Huang H, Gao G. An energy efficient TLB design methodology. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, California, USA, 2005, pp.351–356.Google Scholar
- Inoue K, Moshnyaga V G, Murakami K. A low energy set-associative I-Cache with extended BTB. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.187–192.Google Scholar
- Reinman G, Jouppi N. CACTI 2.0: An integrated cache timing and power model. Compaq, Palo Alto, CA, WRL Res. Rep., July 2000.Google Scholar
- Seznec A, Felix S, Krishnam V, Sazeides Y. Design tradeoffs for the Alpha EV8 conditional branch predictor. In Proc. 29th Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.295–306.Google Scholar
- Hossain A, Pease D J, Burns J S, Parveen N. Trace cache performance parameters. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.348–355.Google Scholar
- Hu J S, Vijaykrishnan N, Irwin M J, Kandemir M. Using dynamic branch behavior for power-efficient instruction fetch. In Proc. the IEEE Computer Society Annual Symposium on VLSI, Tampa, Florida, USA, 2003, pp.127–132.Google Scholar
- Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M R. Hotleakage: An architectural, temperature-aware model of subthreshold and gate leakage. Tech. Report CS–2003–05, Department of Computer Sciences, University of Virginia, Virginia, USA, Mar. 2003.Google Scholar
- Burger D C, Austin T M. The SimpleScalar tool set, Version 2.0. Computer Architecture News, New York, USA, 1997, 25(3): 13–25.Google Scholar
- Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural power analysis and optimizations. In Proc. 27th Int. Symposium on Computer Architecture, British Columbia, Canada, 2000, pp.83–94.Google Scholar
- Shivakumar P, Jouppi N. CACTI 3.0: An integrated cache timing, power, and area model. Compaq, Palo Alto, CA, WRL Res. Rep., Feb. 2001.Google Scholar
- Standard Performance Evaluation Corp. http://www. specbench.org.
- Baniasadi A, Moshovos A. SEPAS: A highly accurate and energy-efficient branch predictor. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.38–43.Google Scholar
- Deris K J, Baniasadi A. SABA: A zero timing overhead power-aware BTB for high-performance processors. Workshop on Unique Chips and Systems held in conjunction with IEEE International Symposium on Performance Analysis of Systems and Software, Austin, Texas, USA, 2006.Google Scholar