CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

Sun, Han-Xin; Yang, Kun-Peng; Zhao, Yu-Lai; Tong, Dong; Cheng, Xu

doi:10.1007/s11390-008-9117-z

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

Regular Paper
Published: 31 January 2008

Volume 23, pages 141–153, (2008)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Han-Xin Sun¹,
Kun-Peng Yang¹,
Yu-Lai Zhao¹,
Dong Tong¹ &
…
Xu Cheng¹

36 Accesses
1 Citation
Explore all metrics

Abstract

The instruction fetch unit (IFU) usually dissipates a considerable portion of total chip power. In traditional IFU architectures, as soon as the fetch address is generated, it needs to be sent to the instruction cache and TLB arrays for instruction fetch. Since limited work can be done by the power-saving logic after the fetch address generation and before the instruction fetch, previous power-saving approaches usually suffer from the unnecessary restrictions from traditional IFU architectures. In this paper, we present CASA, a new power-aware IFU architecture, which effectively reduces the unnecessary restrictions on the power-saving approaches and provides sufficient time and information for the power-saving logic of both instruction cache and TLB. By analyzing, recording, and utilizing the key information of the dynamic instruction flow early in the front-end pipeline, CASA brings the opportunity to maximize the power efficiency and minimize the performance overhead. Compared to the baseline configuration, the leakage and dynamic power of instruction cache is reduced by 89.7% and 64.1% respectively, and the dynamic power of instruction TLB is reduced by 90.2%. Meanwhile the performance degradation in the worst case is only 0.63%. Compared to previous state-of-the-art power-saving approaches, the CASA-based approach saves IFU power more effectively, incurs less performance overhead and achieves better scalability. It is promising that CASA can stimulate further work on architectural solutions to power-efficient IFU designs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

Article Open access 29 January 2015

References

Wilcox K, Manne Srilatha. Alpha processors: A history of power issues and a look to the future. Nov. 15th, 1999, http://www.eecs.umich.edu/∼tnm/cool.html.
Manne S, Klauser A, Grunwald D. Pipeline gating: Speculation control for energy reduction. In Proc. 25th Int. Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.132–141.
Montanaro J et al. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. IEEE Journal of Solid-State Circuits, 1996, 32(11): 1703–1714.
Article Google Scholar
Kim N S, Flautner K, Blaauw D, Mudge T. Drowsy instruction caches. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.219–230.
Chang Y, Ruan S, Lai F. Design and analysis of low-power cache using two-level filter scheme. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2003, 11(4): 568–580.
Article Google Scholar
Kadayif I, Sivasubramaniam A, Kandemir M, Kandiraju G, Chen G. Generating physical addresses directly for saving instruction TLB energy. In Proc. 35th IEEE/ACM Int. Symposium on Microarchitecture, Istanbul, Turkey, 2002, pp.185–196.
Bellas N, Hajj I N, Polychronopoulos C D, Stamoulis G. Architectural and compiler techniques for energy reduction in high-performance microprocessors. IEEE Trans. Very Large Scale Integration Systems, 2000, 8(3): 317–326.
Article Google Scholar
Su C L, Despain A M. Cache design for energy efficiency. In Proc. 28th Int. System Sciences Conference, Hawaii, USA, 1995, pp.306–315.
Ghose K, Kamble M B. Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, CA, USA, 1999, pp.70–75.
Powell M D, Agarwal A, Vijaykumar T N, Falsafi B, Roy K. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In Proc. Int. Symposium on Microarchitecture, Austin, Texas, USA, 2001, pp.54–65.
Powell M D, Yang S, Falsafi B, Roy K, Vijaykumar T M. Reducing leakage in a high-performance deep submicron instruction cache. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2001, 9(1): 77–89.
Article Google Scholar
Kim N S, Flautner K, Blaauw D, Mudge T. Circuit and microarchitectural techniques for reducing cache leakage power. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2004, 12(2): 167–184.
Article Google Scholar
Agarwal A, Li H, Roy K. DRG-cache: A data retention gated-ground cache for low power. In Proc. Design Automation Conference, New Orleans, LA, USA, 2002, pp.473–478.
Heo S, Barr K, Hampton M, Asanovic K. Dynamic fine-grain leakage reduction using leakage-biased bitlines. In Proc. Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.137–147.
Soontae K, Vijaykrishnan N, Kandemir M, Irwin M J. Predictive precharging for bitline leakage energy reduction. In Proc. IEEE ASIC/SOC Conference, 2002, pp.36–40.
Kim N S, Flautner K, Blaauw D, Mudge T. Single-VDD and single-VT super-drowsy techniques for low-leakage high-performance instruction caches. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.54–57.
Lee J, Park G, Park S, Kim S. A selective filter-bank TLB system. In Proc. Int. Symposium on Low Power Electronics and Design, Seoul, Korea, 2003, pp.312–317.
Fan D, Tang Z, Huang H, Gao G. An energy efficient TLB design methodology. In Proc. Int. Symposium on Low Power Electronics and Design, San Diego, California, USA, 2005, pp.351–356.
Smith J E, Sohi G S. The microarchitecture of superscalar processors. Proc. the IEEE, 1995, 83(12): 1609–1624.
Article Google Scholar
Horel T, Lauterbach G. UltraSPARC-III: Designing third-generation 64-bit performance. IEEE Micro, 1999, 19(3): 73–85.
Article Google Scholar
Inoue K, Moshnyaga V G, Murakami K. A low energy set-associative I-Cache with extended BTB. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.187–192.
Reinman G, Jouppi N. CACTI 2.0: An integrated cache timing and power model. Compaq, Palo Alto, CA, WRL Res. Rep., July 2000.
Seznec A, Felix S, Krishnam V, Sazeides Y. Design tradeoffs for the Alpha EV8 conditional branch predictor. In Proc. 29th Int. Symposium on Computer Architecture, Anchorage, Alaska, USA, 2002, pp.295–306.
Hossain A, Pease D J, Burns J S, Parveen N. Trace cache performance parameters. In Proc. the IEEE International Conference on Computer Design: VLSI in Computers and Processors, Freiburg, Germany, 2002, pp.348–355.
Hu J S, Vijaykrishnan N, Irwin M J, Kandemir M. Using dynamic branch behavior for power-efficient instruction fetch. In Proc. the IEEE Computer Society Annual Symposium on VLSI, Tampa, Florida, USA, 2003, pp.127–132.
Zhang Y, Parikh D, Sankaranarayanan K, Skadron K, Stan M R. Hotleakage: An architectural, temperature-aware model of subthreshold and gate leakage. Tech. Report CS–2003–05, Department of Computer Sciences, University of Virginia, Virginia, USA, Mar. 2003.
Burger D C, Austin T M. The SimpleScalar tool set, Version 2.0. Computer Architecture News, New York, USA, 1997, 25(3): 13–25.
Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural power analysis and optimizations. In Proc. 27th Int. Symposium on Computer Architecture, British Columbia, Canada, 2000, pp.83–94.
Shivakumar P, Jouppi N. CACTI 3.0: An integrated cache timing, power, and area model. Compaq, Palo Alto, CA, WRL Res. Rep., Feb. 2001.
Standard Performance Evaluation Corp. http://www. specbench.org.
Baniasadi A, Moshovos A. SEPAS: A highly accurate and energy-efficient branch predictor. In Proc. Int. Symposium on Low Power Electronics and Design, Newport Beach, California, USA, 2004, pp.38–43.
Deris K J, Baniasadi A. SABA: A zero timing overhead power-aware BTB for high-performance processors. Workshop on Unique Chips and Systems held in conjunction with IEEE International Symposium on Performance Analysis of Systems and Software, Austin, Texas, USA, 2006.

Download references

Author information

Authors and Affiliations

Microprocessor Research and Development Center, Peking University, Beijing, 100871, China
Han-Xin Sun, Kun-Peng Yang, Yu-Lai Zhao, Dong Tong & Xu Cheng

Authors

Han-Xin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Kun-Peng Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Lai Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Dong Tong
View author publications
You can also search for this author in PubMed Google Scholar
Xu Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han-Xin Sun.

Additional information

Supported by the National High Technology Development 863 Program of China under Grant No. 2004AA1Z1010.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 82.7 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, HX., Yang, KP., Zhao, YL. et al. CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs. J. Comput. Sci. Technol. 23, 141–153 (2008). https://doi.org/10.1007/s11390-008-9117-z

Download citation

Received: 07 January 2007
Revised: 09 August 2007
Published: 31 January 2008
Issue Date: January 2008
DOI: https://doi.org/10.1007/s11390-008-9117-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

Abstract

Access this article

Similar content being viewed by others

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 82.7 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

CASA: A New IFU Architecture for Power-Efficient Instruction Cache and TLB Designs

Abstract

Access this article

Similar content being viewed by others

Dual-IS: Instruction Set Modality for Efficient Instruction Level Parallelism

Shrinking L1 Instruction Caches to Improve Energy–Delay in SMT Embedded Processors

An optimizing pipeline stall reduction algorithm for power and performance on multi-core CPUs

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 82.7 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation