Skip to main content

Decoupled State-Execute Architecture

  • Conference paper
High-Performance Computing (ISHPC 2005, ALPS 2006)

Abstract

The majority of register file designs follow one of two well–known approaches. Many modern high-performance processors (POWER4 [1] , Pentium4 [2]) use a merged register file that holds both architectural and rename registers. Other processors use a Future File (eg, Opteron [3]) with rename registers kept separately in reservation stations. Both approaches have issues that may limit their application in future microprocessors. The merged register file scales poorly in terms of power- performance while the Future File has to pay a large penalty due on branch mis–prediction recovery. In addition, the Future File requires the use of the less scalable mechanism of reservation stations.

This paper proposes to combine the best aspects of the traditional Future File architecture with those of the merged physical register file. The key point is that the new architecture separates the processor state, in particular the registers, and the execution units in the pipeline back–end. Therefore it is called Decoupled State-Execute Architecture. The resulting register file can be accessed in the pipeline front–end and has several desirable properties that allow efficient application of several optimizations, most notably the register file banking and a novel writeback filtering mechanism. As a result, only a 1.0% IPC degradation was observed with aggressive banking and the energy consumption was lowered by the new writeback filtering technique. Together, the two optimizations remove approximately 80% of the energy consumed in register file data array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tendler, J., Dodson, S., Fields, S., Le, B.S.H.: Power4 system microarchitecture. IBM Journal of Research and Development 46(1) (2002)

    Google Scholar 

  2. Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel Technology Journal (2001)

    Google Scholar 

  3. Keltcher, C., McGrath, K., Ahmed, A., Conway, P.: The AMD Opteron processor for multiprocessor servers. IEEE Micro 23, 66–76 (2003)

    Article  Google Scholar 

  4. Gowan, M.K., Biro, L.L., Jackson, D.B.: Power considerations in the design lf the Alpha 21264. In: Proc. of the 35th Design Automation Conference (1998)

    Google Scholar 

  5. Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 25–33 (January 1967)

    Google Scholar 

  6. Liptay, J.: Design of the IBM Enterprise System/9000 high-end processor. IBM Journal of Research and Development 36(4) (July 1992)

    Google Scholar 

  7. Yeager, K.C.: The MIPS R10000 superscalar microprocessor. IEEE Micro 16, 28–41 (1996)

    Article  Google Scholar 

  8. Zyuban, V., Kogge, P.: The energy complexity of register files. In: Intl. Symp. on Low Energy Electronics and Design, pp. 305–310 (1998)

    Google Scholar 

  9. Park, I., Powell, M.D., Vijaykumar, T.: Reducing register ports for higher speed and lower energy. In: Proc. of the 35th Annual Intl. Symposium on Microarchitecture (December 2002)

    Google Scholar 

  10. Kim, N.S., Mudge, T.: Reducing register ports using delayed write-back queues and operand pre-fetch. In: Proc. of the 17th ACM Intl. Conf. on Supercomputing (June 2003)

    Google Scholar 

  11. Gonzalez, R., Cristal, A., Ortega, D., Veidenbaum, A., Valero, M.: A content aware integer register file organisation. In: Proc. of the 31th Intl. Symp. on Computer Architecture (2004)

    Google Scholar 

  12. Cruz, J., Gonzez, A., Valero, M., Topham, N.: Multiple-banked register file architecture. In: Proc. of the 27th Intl. Symp. on Computer Architecture, pp. 316–325 (2000)

    Google Scholar 

  13. Balasubramonian, R., Dwarkas, S., Albonesi, D.: Reducing the complexity of the register file in dynamic superscalar processors. In: Proc of the 34th Intl. Symp. on Microarchitecture (2001)

    Google Scholar 

  14. Zalamea, J., Llosa, J., Ayguad, E., Valero, M.: Two-level hierarchical register file organization for VLIW processors. In: Proc of the 33th Intl. Symp. on Microarchitecture (MICRO-33), pp. 137–146 (2000)

    Google Scholar 

  15. Palacharla, S., Jouppi, N., Smith, J.: Complexity-effective superscalar processors. In: Proc. of the 24th Intl. Symp. on Computer Architecture (1997)

    Google Scholar 

  16. Kessler, R.: The Alpha 21264 microprocessor. IEEE MICRO 19 (March 1999)

    Google Scholar 

  17. Seznec, A., Toullec, E., Rochecouste, O.: Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors. In: Proc. of the 35th Intl. Symp. on Microarchitecture, pp. 383–394 (2002)

    Google Scholar 

  18. Smith, J.E., Pleszkun, A.R.: Implementation of precise interrupts in pipelined proccessors. In: Proc. of the 12th Intl. Symp. on Computer Architecture, pp. 34–44 (1985)

    Google Scholar 

  19. Johnson, M.: Superscalar Microprocessor Design. Prentice-Hall, Englewood Cliffs (1990)

    Google Scholar 

  20. Austin, T., Larson, E., Ernst, D.: Simplescalar: an infrastructure for computer system modeling. IEEE Computer (2002)

    Google Scholar 

  21. Perelman, E., Hamerly, G., Biesbrouck, M.V., Sherwood, T., Calder, B.: Using SimPoint for accurate and efficient simulation. In: Proc. of the Intl. Conf. on Measurement and Modeling of Computer Systems (2003)

    Google Scholar 

  22. Tseng, J., Asanovic, K.: Banked multiported register files for high-frequency superscalar microprocessors. In: Proc. of the 30th Annual Intl. Symp. on Computer Architecture (2003)

    Google Scholar 

  23. Rixner, S., Dally, W.J., Khailany, B., Mattson, P.R., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Proc. of the 6th Intl. Symp. on High Performance Computer Architecture, pp. 375–386 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jesús Labarta Kazuki Joe Toshinori Sato

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pericàs, M., Cristal, A., González, R., Veidenbaum, A., Valero, M. (2008). Decoupled State-Execute Architecture. In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77704-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77703-8

  • Online ISBN: 978-3-540-77704-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics