Skip to main content

An EPIC Processor with Pending Functional Units

  • Conference paper
  • First Online:
High Performance Computing (ISHPC 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2327))

Included in the following conference series:

Abstract

The Itanium processor, an implementation of an Explicitly Parallel Instruction Computing (EPIC) architecture, is an in-order processor that fetches, executes, and forwards results to functional units in-order. The architecture relies heavily on the compiler to expose Instruction Level Parallelism (ILP) to avoid stalls created by in-order processing.

The goal of this paper is to examine, in small steps, changing the in-order Itanium processor model to allow execution to be performed out-of-order. The purpose is to overcome memory and functional unit latencies. To accomplish this, we consider an architecture with Pending Functional Units (PFU). The PFU architecture assigns/schedules instructions to functional units in-order. Instructions sit at the pending functional units until their operands become ready and then execute out-of-order. While an instruction is pending at a functional unit, no other instruction can be scheduled to that functional unit. We examine several PFU architecture designs. The minimal design does not perform renaming, and only supports bypassing of non-speculative result values. We then examine making PFU more aggressive by supporting speculative register state, and then finally by adding in register renaming. We show that the minimal PFU architecture provides on average an 18% speedup over an in-order EPIC processor and produces up to half of the speedup that would be gained using a full out-of-order architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. IA-64 Application Instruction Set Architecture Guide, Revision 1.0, 1999.

    Google Scholar 

  2. Itanium Processor Microarchitecture Reference:for Software Optimization, 2000. http://www.developer.intel.com/design/ia64/itanium.htm.

  3. 2001-a processor odyssey: the first ever McKinley processor is demonstrated by hp and Intel at Intel’s developer forum, February 2001. http://www.hp.com/products1/itanium/news_events/archives/NIL0014KJ.html.

  4. D.C. Burger and T.M. Austin. The Simplescalar Tool Set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison, Jun 1997.

    Google Scholar 

  5. COMPAQ Computer Corp. Alpha 21264 microprocessor hardware reference manual, July 1999.

    Google Scholar 

  6. S. Eranian and D. Mosberger. The Linux/IA64 Project: Kernel Design and Status Update. Technical Report HPL-2000-85, HP Labs, June 2000.

    Google Scholar 

  7. G. Hinton, D. Sager, M. Upton, D. Boggs, D. Carmean, A. Kyker, and P. Roussel. The microarchitecture of the pentium 4 processor. Intel Technology Journal Q1, 2001.

    Google Scholar 

  8. B. R. Rau. Dynamically scheduled vliw processors. In Proceedings of the 26th Annual Intl. Symp. on Microarchitecture, pages 80–92, December 1993.

    Google Scholar 

  9. H. Sharangpani and K. Arora. Itanium processor microarchitecture. In IEEE MICRO, pages 24–43, 2000.

    Google Scholar 

  10. T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques, September 2001.

    Google Scholar 

  11. J. E. Thornton. Design of a Computer, the Control Data 6600, 1970. Scott, Foresman, Glenview, Ill.

    Google Scholar 

  12. R. M. Tomasulo. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development, 11(1):25–33, 1967.

    Article  MATH  Google Scholar 

  13. P. H. Wang, H. Wang, R. M. Kling, K. Ramakrishnan, and J. P. Shen. Register renaming for dynamic execution of predicated code. In Proceedings of the 7th International Symposium on High Performance Computer Architecture, February 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Carter, L., Chuang, W., Calder, B. (2002). An EPIC Processor with Pending Functional Units. In: Zima, H.P., Joe, K., Sato, M., Seo, Y., Shimasaki, M. (eds) High Performance Computing. ISHPC 2002. Lecture Notes in Computer Science, vol 2327. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47847-7_27

Download citation

  • DOI: https://doi.org/10.1007/3-540-47847-7_27

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43674-4

  • Online ISBN: 978-3-540-47847-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics