Decoupled State-Execute Architecture

Pericàs, Miquel; Cristal, Adrian; González, Ruben; Veidenbaum, Alex; Valero, Mateo

doi:10.1007/978-3-540-77704-5_6

Miquel Pericàs^1,2,
Adrian Cristal²,
Ruben González¹,
Alex Veidenbaum³ &
…
Mateo Valero^1,2

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4759))

Included in the following conference series:

780 Accesses

Abstract

The majority of register file designs follow one of two well–known approaches. Many modern high-performance processors (POWER4 [1] , Pentium4 [2]) use a merged register file that holds both architectural and rename registers. Other processors use a Future File (eg, Opteron [3]) with rename registers kept separately in reservation stations. Both approaches have issues that may limit their application in future microprocessors. The merged register file scales poorly in terms of power- performance while the Future File has to pay a large penalty due on branch mis–prediction recovery. In addition, the Future File requires the use of the less scalable mechanism of reservation stations.

This paper proposes to combine the best aspects of the traditional Future File architecture with those of the merged physical register file. The key point is that the new architecture separates the processor state, in particular the registers, and the execution units in the pipeline back–end. Therefore it is called Decoupled State-Execute Architecture. The resulting register file can be accessed in the pipeline front–end and has several desirable properties that allow efficient application of several optimizations, most notably the register file banking and a novel writeback filtering mechanism. As a result, only a 1.0% IPC degradation was observed with aggressive banking and the energy consumption was lowered by the new writeback filtering technique. Together, the two optimizations remove approximately 80% of the energy consumed in register file data array.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tendler, J., Dodson, S., Fields, S., Le, B.S.H.: Power4 system microarchitecture. IBM Journal of Research and Development 46(1) (2002)
Google Scholar
Hinton, G., Sager, D., Upton, M., Boggs, D., Carmean, D., Kyker, A., Roussel, P.: The microarchitecture of the Pentium 4 processor. Intel Technology Journal (2001)
Google Scholar
Keltcher, C., McGrath, K., Ahmed, A., Conway, P.: The AMD Opteron processor for multiprocessor servers. IEEE Micro 23, 66–76 (2003)
Article Google Scholar
Gowan, M.K., Biro, L.L., Jackson, D.B.: Power considerations in the design lf the Alpha 21264. In: Proc. of the 35th Design Automation Conference (1998)
Google Scholar
Tomasulo, R.M.: An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 25–33 (January 1967)
Google Scholar
Liptay, J.: Design of the IBM Enterprise System/9000 high-end processor. IBM Journal of Research and Development 36(4) (July 1992)
Google Scholar
Yeager, K.C.: The MIPS R10000 superscalar microprocessor. IEEE Micro 16, 28–41 (1996)
Article Google Scholar
Zyuban, V., Kogge, P.: The energy complexity of register files. In: Intl. Symp. on Low Energy Electronics and Design, pp. 305–310 (1998)
Google Scholar
Park, I., Powell, M.D., Vijaykumar, T.: Reducing register ports for higher speed and lower energy. In: Proc. of the 35th Annual Intl. Symposium on Microarchitecture (December 2002)
Google Scholar
Kim, N.S., Mudge, T.: Reducing register ports using delayed write-back queues and operand pre-fetch. In: Proc. of the 17th ACM Intl. Conf. on Supercomputing (June 2003)
Google Scholar
Gonzalez, R., Cristal, A., Ortega, D., Veidenbaum, A., Valero, M.: A content aware integer register file organisation. In: Proc. of the 31th Intl. Symp. on Computer Architecture (2004)
Google Scholar
Cruz, J., Gonzez, A., Valero, M., Topham, N.: Multiple-banked register file architecture. In: Proc. of the 27th Intl. Symp. on Computer Architecture, pp. 316–325 (2000)
Google Scholar
Balasubramonian, R., Dwarkas, S., Albonesi, D.: Reducing the complexity of the register file in dynamic superscalar processors. In: Proc of the 34th Intl. Symp. on Microarchitecture (2001)
Google Scholar
Zalamea, J., Llosa, J., Ayguad, E., Valero, M.: Two-level hierarchical register file organization for VLIW processors. In: Proc of the 33th Intl. Symp. on Microarchitecture (MICRO-33), pp. 137–146 (2000)
Google Scholar
Palacharla, S., Jouppi, N., Smith, J.: Complexity-effective superscalar processors. In: Proc. of the 24th Intl. Symp. on Computer Architecture (1997)
Google Scholar
Kessler, R.: The Alpha 21264 microprocessor. IEEE MICRO 19 (March 1999)
Google Scholar
Seznec, A., Toullec, E., Rochecouste, O.: Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors. In: Proc. of the 35th Intl. Symp. on Microarchitecture, pp. 383–394 (2002)
Google Scholar
Smith, J.E., Pleszkun, A.R.: Implementation of precise interrupts in pipelined proccessors. In: Proc. of the 12th Intl. Symp. on Computer Architecture, pp. 34–44 (1985)
Google Scholar
Johnson, M.: Superscalar Microprocessor Design. Prentice-Hall, Englewood Cliffs (1990)
Google Scholar
Austin, T., Larson, E., Ernst, D.: Simplescalar: an infrastructure for computer system modeling. IEEE Computer (2002)
Google Scholar
Perelman, E., Hamerly, G., Biesbrouck, M.V., Sherwood, T., Calder, B.: Using SimPoint for accurate and efficient simulation. In: Proc. of the Intl. Conf. on Measurement and Modeling of Computer Systems (2003)
Google Scholar
Tseng, J., Asanovic, K.: Banked multiported register files for high-frequency superscalar microprocessors. In: Proc. of the 30th Annual Intl. Symp. on Computer Architecture (2003)
Google Scholar
Rixner, S., Dally, W.J., Khailany, B., Mattson, P.R., Kapasi, U.J., Owens, J.D.: Register organization for media processing. In: Proc. of the 6th Intl. Symp. on High Performance Computer Architecture, pp. 375–386 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Architecture Department, Technical University of Catalonia (UPC), Jordi Girona, 1-3, Mòdul D6 Campus Nord, 08034, Barcelona, Spain
Miquel Pericàs, Ruben González & Mateo Valero
Computer Sciences, Barcelona Supercomputing Center (BSC), Jordi Girona, 29, Edifici Nexus-II Compus Nord, 08034, Barcelona, Spain
Miquel Pericàs, Adrian Cristal & Mateo Valero
Department of Computer Science, University of California (UCI), 3019 Donald Bren Hall, Irvine, CA, 92697-3435, USA
Alex Veidenbaum

Authors

Miquel Pericàs
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Cristal
View author publications
You can also search for this author in PubMed Google Scholar
Ruben González
View author publications
You can also search for this author in PubMed Google Scholar
Alex Veidenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Valero
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Jesús Labarta Kazuki Joe Toshinori Sato

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pericàs, M., Cristal, A., González, R., Veidenbaum, A., Valero, M. (2008). Decoupled State-Execute Architecture. In: Labarta, J., Joe, K., Sato, T. (eds) High-Performance Computing. ISHPC ALPS 2005 2006. Lecture Notes in Computer Science, vol 4759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77704-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-77704-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77703-8
Online ISBN: 978-3-540-77704-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics