Abstract
The on-chip cache consumes a substantial portion of energy in today’s processors. Loops have temporal locality, so that loop buffer had been proposed. We attempt to apply concept of trace cache in the architecture of the loop buffer, however it is quiet bulky and complicated. If using a trace cache as a loop buffer, we do save the energy. Contrarily, it debases the integral performance due to long latency at fetch stage. We therefore propose these methods of (1) doing innermost loop detection at commit stage and filling/active at fetch stage; and (2) assisting loop buffer in storing the innermost loops with forward branches to pack the instructions captured from the instruction cache as basic blocks. With the preceding modifications, we hope to strengthen the loop buffer for gaining performance and reducing more power. Results with SPEC2000 indicate that up to 45% (integer benchmarks) and 55% (floating benchmarks) of reductions in instruction fetch power compared with the design without loop buffer. Furthermore, we got 3% (integer benchmarks) and 2% (floating benchmarks) of power improvement than the design of the loop buffer that deal with loops at fetch stage.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lee, L., Moyer, B., Arends, J.: Low-Cost Embedded Program Loop Caching – Revisited. U. Mich. Technique Reports, number CSE-TR-411-99.W.-K. Chen, Linear Networks and Systems (Book style). Wadsworth, Belmont, pp. 123–135 (1993)
Anderson, T., Agarwala, S.: Effective hardware-based two-way loop cache for high performance low power processors. In: International Conference on Computer Design: VLSI in Computers & Processors (2000)
Wu, I.–W., Tein, B.-H., Chung, C.-P.: Instruction Fetch Energy Reduction Using Forward-Branch and Subroutine Bufferable Innermost Loop Buffer. In: International Computer Symposium (2006)
Wu, C.-K., Chiu, J.-C.: Design of Buffering Mechanism for Improving Instruction and Data Stream. Master Degree Thesis, Department of Electrical Engineering, National Sun Yat-Sen University (June 2003)
Fritts, J., Wolf, W.: Instruction fetch characteristics of media processing. In: SPIE Photonics West, on Media Processors 2002, San Jose, CA, January 2002, pp. 72–83 (2002)
Chu, Y., Ito, M.R.: An efficient instruction cache scheme for object-oriented languages. In: IEEE International Conference on, on Performance, Computing and Communications, pp. 329–336 (April 2001)
Chen, S.-L., Shieh, J.-J.: Performance Evaluation of a Trace cache Engine, Master Degree Thesis, Department of Computer Science and Engineering, Tatung University (January 2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhong, MY., Shieh, JJ. (2010). Power Improvement Using Block-Based Loop Buffer with Innermost Loop Control. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13136-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-13136-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13135-6
Online ISBN: 978-3-642-13136-3
eBook Packages: Computer ScienceComputer Science (R0)