Reducing Delay and Power Consumption of the Wakeup Logic Through Instruction Packing and Tag Memoization

Sharkey, Joseph; Ponomarev, Dmitry; Ghose, Kanad; Ergin, Oguz

doi:10.1007/11574859_2

Joseph Sharkey¹⁸,
Dmitry Ponomarev¹⁸,
Kanad Ghose¹⁸ &
…
Oguz Ergin¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3471))

Included in the following conference series:

International Workshop on Power-Aware Computer Systems

776 Accesses
2 Citations

Abstract

Dynamic instruction scheduling logic is one of the most critical components of modern superscalar microprocessors, both from the delay and power dissipation standpoints. The delay and energy requirement of driving the result tags across the associatively-addressed issue queue accounts for a significant percentage of the scheduler’s overhead and also limits the design scalability. We propose two schemes to reduce the power consumption and the delays of the wakeup logic. Our first scheme – instruction packing – shares the associative part of an issue queue entry between two instructions, each with at most one non-ready source. As a result, the number of entries in the issue queue (and, hence, the length of the tag buses) can be reduced by a factor of two with almost no impact on the IPCs, because most instructions either enter the pipeline with at least one of their source operands ready, or do not make use of two source registers to begin with. Our second scheme – tag memoization – avoids driving the upper portion of the tags, if those bits did not change their values from what was driven on the same tag bus during the most recent broadcast. While instruction packing results in the reduced length of the tag buses, tag memoization reduced the number of tag lines that need to be driven. We evaluate our designs using detailed microarchitectural simulations of the SPEC 2000 benchmarks and the SPICE simulations of the issue queue layouts.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11574859_13 .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Palacharla, S., et al.: Complexity-Effective Superscalar Processors. In: The Proc. of the Int’l Symp. on Computer Architecture (1997)
Google Scholar
Stark, J., et al.: On Pipelining Dynamic Instruction Scheduling Logic. In: The Proc. of the Int’l Symp. on Microarchitecture (2000)
Google Scholar
Burger, D. and Austin, T. M., The SimpleScalar tool set: Version 2.0, Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, and documentation for all Simplescalar releases (June 1997)
Google Scholar
Ernst, D., Austin, T.: Efficient Dynamic Scheduling Through Tag Elimination. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Google Scholar
Brekelbaum, E., et al.: Hierarchical Scheduling Windows. In: The Proc. of the Int’l Symp. on Microarchitecture (2002)
Google Scholar
Lebeck, A., et al.: A Large, “Fast Instruction Window for Tolerating Cache Misses”. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Google Scholar
Brown, M., Stark, J., Patt, Y.: Select-Free Instruction Scheduling Logic. In: The Proc. of the Int’l Symp. on Microarchitecture (2001)
Google Scholar
Kim, I., Lipasti, M.: Macro-Op Scheduling: Relaxing Scheduling Loop Constraints. In: The Proc. of the Int’l Symp. on Microarchitecture (2003)
Google Scholar
Cristal, A., et al.: Out-of-Order Commit Processors. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Google Scholar
Ernst, D., Hamel, A., Austin, T.: Cyclone: a Broadcast-free Dynamic Instruction Scheduler with Selective Replay. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Google Scholar
Hu, J., Vijaykrishnan, N., Irwin, M.: Exploring Wakeup-Free Instruction Scheduling. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Google Scholar
Canal, R., Gonzalez, A.: A Low-Complexity Issue Logic. In: The Proc. of the Int’l Conference on Supercomputing (2000)
Google Scholar
Canal, R., Gonzalez, A.: Reducing the Complexity of the Issue Logic. In: The Proc. of the Int’l Conference on Supercomputing (2001)
Google Scholar
Raasch, S., Binkert, N., Reinhardt, S.: A Scalable Instruction Queue Design Using Dependence Chains. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Google Scholar
Abella, J., Gonzalez, A.: Low-Complexity Distributed Issue Queue. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Google Scholar
Michaud, P., et al.: Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2001)
Google Scholar
Ehrhart, T., Patel, S.: Reducing the Scheduling Critical Cycle using Wakeup Prediction. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Google Scholar
Liu, Y., et al.: Scaling the Issue Window with Look-Ahead Latency Prediction. In: The Proc. of the Int’l Conference on Supercomputing (2004)
Google Scholar
Chishti, Z., Vijaykumar, T.: Wire Delay Is Not a Problem for SMT. In: The Proc. of the Int’l Symp. on Computer Architecture (2004)
Google Scholar
Srinivasan, S., et al.: Continual Flow Pipelines. In: The Proc. of the Int’l Conference on Architectural Support for Programming Languages and Operating Systems (2004)
Google Scholar
Bracy, A., et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. In: The Proc. of the Int’l Symp. on Microarchitecture (2004)
Google Scholar
Buyuktosunoglu, A., et al.: A Circuit-Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors. In: GLSVLSI (2001)
Google Scholar
Folegnani, D., Gonzalez, A.: Energy-Effective Issue Logic. In: The Proc. of the Int’l Symp. on Computer Architecture (2001)
Google Scholar
Ponomarev, D., Kucuk, G., Ghose, K.: Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In: The Proc. of the Int’l Symp. on Microarchitecture (2001)
Google Scholar
Buyuktosunoglu, A., et al.: Energy-Efficient Co-adaptive Instruction Fetch and Issue. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Google Scholar
Ponomarev, D., et al.: Energy-Efficient Issue Queue Design. IEEE Transactions on VLSI Systems (November 2003)
Google Scholar
Ponomarev, D., et al.: Energy-Efficient Comparators for Superscalar Datapaths. IEEE Transactions on Computers (July 2004)
Google Scholar
Kim, I., Lipasti, M.: Half-Price Architecture. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Google Scholar
Huang, M., et al.: Energy-Efficient Hybrid Wakeup Logic. In: The Proc. of the Int’l Symp. on Low-Power Electronics and Design (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, State University of New York, Binghamton, NY, 13902-6000, USA
Joseph Sharkey, Dmitry Ponomarev & Kanad Ghose
Intel Barcelona Research Center, Intel Labs, UPC, Barcelona, Spain
Oguz Ergin

Authors

Joseph Sharkey
View author publications
You can also search for this author in PubMed Google Scholar
Dmitry Ponomarev
View author publications
You can also search for this author in PubMed Google Scholar
Kanad Ghose
View author publications
You can also search for this author in PubMed Google Scholar
Oguz Ergin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical and Computer Engineering, Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Babak Falsafi
ECE, Purdue University, 47907, IN, USA
T. N. VijayKumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharkey, J., Ponomarev, D., Ghose, K., Ergin, O. (2005). Reducing Delay and Power Consumption of the Wakeup Logic Through Instruction Packing and Tag Memoization. In: Falsafi, B., VijayKumar, T.N. (eds) Power-Aware Computer Systems. PACS 2004. Lecture Notes in Computer Science, vol 3471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574859_2

Download citation

DOI: https://doi.org/10.1007/11574859_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29790-1
Online ISBN: 978-3-540-31485-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics