Skip to main content

Power Improvement Using Block-Based Loop Buffer with Innermost Loop Control

  • Conference paper
  • 703 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6082))

Abstract

The on-chip cache consumes a substantial portion of energy in today’s processors. Loops have temporal locality, so that loop buffer had been proposed. We attempt to apply concept of trace cache in the architecture of the loop buffer, however it is quiet bulky and complicated. If using a trace cache as a loop buffer, we do save the energy. Contrarily, it debases the integral performance due to long latency at fetch stage. We therefore propose these methods of (1) doing innermost loop detection at commit stage and filling/active at fetch stage; and (2) assisting loop buffer in storing the innermost loops with forward branches to pack the instructions captured from the instruction cache as basic blocks. With the preceding modifications, we hope to strengthen the loop buffer for gaining performance and reducing more power. Results with SPEC2000 indicate that up to 45% (integer benchmarks) and 55% (floating benchmarks) of reductions in instruction fetch power compared with the design without loop buffer. Furthermore, we got 3% (integer benchmarks) and 2% (floating benchmarks) of power improvement than the design of the loop buffer that deal with loops at fetch stage.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lee, L., Moyer, B., Arends, J.: Low-Cost Embedded Program Loop Caching – Revisited. U. Mich. Technique Reports, number CSE-TR-411-99.W.-K. Chen, Linear Networks and Systems (Book style). Wadsworth, Belmont, pp. 123–135 (1993)

    Google Scholar 

  2. Anderson, T., Agarwala, S.: Effective hardware-based two-way loop cache for high performance low power processors. In: International Conference on Computer Design: VLSI in Computers & Processors (2000)

    Google Scholar 

  3. Wu, I.–W., Tein, B.-H., Chung, C.-P.: Instruction Fetch Energy Reduction Using Forward-Branch and Subroutine Bufferable Innermost Loop Buffer. In: International Computer Symposium (2006)

    Google Scholar 

  4. Wu, C.-K., Chiu, J.-C.: Design of Buffering Mechanism for Improving Instruction and Data Stream. Master Degree Thesis, Department of Electrical Engineering, National Sun Yat-Sen University (June 2003)

    Google Scholar 

  5. Fritts, J., Wolf, W.: Instruction fetch characteristics of media processing. In: SPIE Photonics West, on Media Processors 2002, San Jose, CA, January 2002, pp. 72–83 (2002)

    Google Scholar 

  6. Chu, Y., Ito, M.R.: An efficient instruction cache scheme for object-oriented languages. In: IEEE International Conference on, on Performance, Computing and Communications, pp. 329–336 (April 2001)

    Google Scholar 

  7. Chen, S.-L., Shieh, J.-J.: Performance Evaluation of a Trace cache Engine, Master Degree Thesis, Department of Computer Science and Engineering, Tatung University (January 2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhong, MY., Shieh, JJ. (2010). Power Improvement Using Block-Based Loop Buffer with Innermost Loop Control. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13136-3_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13136-3_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13135-6

  • Online ISBN: 978-3-642-13136-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics