Abstract
Dynamic instruction scheduling logic is one of the most critical components of modern superscalar microprocessors, both from the delay and power dissipation standpoints. The delay and energy requirement of driving the result tags across the associatively-addressed issue queue accounts for a significant percentage of the scheduler’s overhead and also limits the design scalability. We propose two schemes to reduce the power consumption and the delays of the wakeup logic. Our first scheme – instruction packing – shares the associative part of an issue queue entry between two instructions, each with at most one non-ready source. As a result, the number of entries in the issue queue (and, hence, the length of the tag buses) can be reduced by a factor of two with almost no impact on the IPCs, because most instructions either enter the pipeline with at least one of their source operands ready, or do not make use of two source registers to begin with. Our second scheme – tag memoization – avoids driving the upper portion of the tags, if those bits did not change their values from what was driven on the same tag bus during the most recent broadcast. While instruction packing results in the reduced length of the tag buses, tag memoization reduced the number of tag lines that need to be driven. We evaluate our designs using detailed microarchitectural simulations of the SPEC 2000 benchmarks and the SPICE simulations of the issue queue layouts.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11574859_13 .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Palacharla, S., et al.: Complexity-Effective Superscalar Processors. In: The Proc. of the Int’l Symp. on Computer Architecture (1997)
Stark, J., et al.: On Pipelining Dynamic Instruction Scheduling Logic. In: The Proc. of the Int’l Symp. on Microarchitecture (2000)
Burger, D. and Austin, T. M., The SimpleScalar tool set: Version 2.0, Tech. Report, Dept. of CS, Univ. of Wisconsin-Madison, and documentation for all Simplescalar releases (June 1997)
Ernst, D., Austin, T.: Efficient Dynamic Scheduling Through Tag Elimination. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Brekelbaum, E., et al.: Hierarchical Scheduling Windows. In: The Proc. of the Int’l Symp. on Microarchitecture (2002)
Lebeck, A., et al.: A Large, “Fast Instruction Window for Tolerating Cache Misses”. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Brown, M., Stark, J., Patt, Y.: Select-Free Instruction Scheduling Logic. In: The Proc. of the Int’l Symp. on Microarchitecture (2001)
Kim, I., Lipasti, M.: Macro-Op Scheduling: Relaxing Scheduling Loop Constraints. In: The Proc. of the Int’l Symp. on Microarchitecture (2003)
Cristal, A., et al.: Out-of-Order Commit Processors. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Ernst, D., Hamel, A., Austin, T.: Cyclone: a Broadcast-free Dynamic Instruction Scheduler with Selective Replay. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Hu, J., Vijaykrishnan, N., Irwin, M.: Exploring Wakeup-Free Instruction Scheduling. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Canal, R., Gonzalez, A.: A Low-Complexity Issue Logic. In: The Proc. of the Int’l Conference on Supercomputing (2000)
Canal, R., Gonzalez, A.: Reducing the Complexity of the Issue Logic. In: The Proc. of the Int’l Conference on Supercomputing (2001)
Raasch, S., Binkert, N., Reinhardt, S.: A Scalable Instruction Queue Design Using Dependence Chains. In: The Proc. of the Int’l Symp. on Computer Architecture (2002)
Abella, J., Gonzalez, A.: Low-Complexity Distributed Issue Queue. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Michaud, P., et al.: Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2001)
Ehrhart, T., Patel, S.: Reducing the Scheduling Critical Cycle using Wakeup Prediction. In: The Proc. of the Int’l Symp. on High Performance Computer Architecture (2004)
Liu, Y., et al.: Scaling the Issue Window with Look-Ahead Latency Prediction. In: The Proc. of the Int’l Conference on Supercomputing (2004)
Chishti, Z., Vijaykumar, T.: Wire Delay Is Not a Problem for SMT. In: The Proc. of the Int’l Symp. on Computer Architecture (2004)
Srinivasan, S., et al.: Continual Flow Pipelines. In: The Proc. of the Int’l Conference on Architectural Support for Programming Languages and Operating Systems (2004)
Bracy, A., et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. In: The Proc. of the Int’l Symp. on Microarchitecture (2004)
Buyuktosunoglu, A., et al.: A Circuit-Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors. In: GLSVLSI (2001)
Folegnani, D., Gonzalez, A.: Energy-Effective Issue Logic. In: The Proc. of the Int’l Symp. on Computer Architecture (2001)
Ponomarev, D., Kucuk, G., Ghose, K.: Reducing Power Requirements of Instruction Scheduling Through Dynamic Allocation of Multiple Datapath Resources. In: The Proc. of the Int’l Symp. on Microarchitecture (2001)
Buyuktosunoglu, A., et al.: Energy-Efficient Co-adaptive Instruction Fetch and Issue. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Ponomarev, D., et al.: Energy-Efficient Issue Queue Design. IEEE Transactions on VLSI Systems (November 2003)
Ponomarev, D., et al.: Energy-Efficient Comparators for Superscalar Datapaths. IEEE Transactions on Computers (July 2004)
Kim, I., Lipasti, M.: Half-Price Architecture. In: The Proc. of the Int’l Symp. on Computer Architecture (2003)
Huang, M., et al.: Energy-Efficient Hybrid Wakeup Logic. In: The Proc. of the Int’l Symp. on Low-Power Electronics and Design (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sharkey, J., Ponomarev, D., Ghose, K., Ergin, O. (2005). Reducing Delay and Power Consumption of the Wakeup Logic Through Instruction Packing and Tag Memoization. In: Falsafi, B., VijayKumar, T.N. (eds) Power-Aware Computer Systems. PACS 2004. Lecture Notes in Computer Science, vol 3471. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11574859_2
Download citation
DOI: https://doi.org/10.1007/11574859_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29790-1
Online ISBN: 978-3-540-31485-1
eBook Packages: Computer ScienceComputer Science (R0)