Skip to main content

Scalable Multi-cores with Improved Per-core Performance Using Off-the-critical Path Reconfigurable Hardware

  • Conference paper
High Performance Computing - HiPC 2008 (HiPC 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5374))

Included in the following conference series:

  • 694 Accesses

Abstract

Scaling the number of cores in a multi-core processor constraints the resources available in each core, resulting in reduced per-core performance. Alternatively, the number of cores have to be reduced in order to improve per-core performance. In this paper, we propose a technique to improve the per-core performance in a many-core processor without reducing the number of cores. In particular, we integrate a Reconfigurable Hardware Unit (RHU) in each core. The RHU executes the frequently encountered instructions to increase the core’s overall execution bandwidth, thus improving its performance. We also propose a novel integrated hardware/software methodology for efficient RHU reconfiguration. The RHU has low area overhead, and hence has minimal impact on the scalability of the multi-core. Our experiments show that the proposed architecture improves the per-core performance by an average of about 12% across a wide range of applications, while incurring a per-core area overhead of only about 5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Athanas, P., et al.: Processor reconfiguration through instruction-set metamorphosis. IEEE Computer 26(3) (1995)

    Google Scholar 

  2. Bracy, A., et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. In: Proc. MICRO (2004)

    Google Scholar 

  3. Bracy, A., et al.: Serialization-Aware Mini-Graphs: Performance with Fewer Resources. In: Proc. MICRO (2006)

    Google Scholar 

  4. Callahan, T., et al.: The garp architecture and c compiler. IEEE Computer 33(4), 62–69 (2000)

    Article  Google Scholar 

  5. Chou, Y., et al.: Piperench implementation of the instruction path coprocessor. In: Proc. MICRO (2000)

    Google Scholar 

  6. Clark, N., et al.: An architecture framework for transparent instruction set customization in embedded processors. In: Proc. ISCA (2005)

    Google Scholar 

  7. Clark, N., et al.: Processor acceleration through automated instruction-set customization. In: Proc. MICRO (2003)

    Google Scholar 

  8. Clark, N., et al.: Application Specific Processing on a General Purpose Core via Transparent Instruction Set Customization. In: Proc. MICRO (2004)

    Google Scholar 

  9. Corliss, M.L., et al.: DISE: A Programmable Macro Engine for Customizing Applications. In: Proc. ISCA (2003)

    Google Scholar 

  10. Fahs, B., et al.: Performance characterization of a hardware mechanism for dynamic optimization. In: Proc. MICRO (2001)

    Google Scholar 

  11. Guthaus, M.R., et al.: MiBench: A free, commercially representative embedded benchmark suite. Work. Workload Characterization (2001)

    Google Scholar 

  12. Hammond, L., et al.: A Single-Chip Multiprocessor. IEEE Computer 30(9) (September 1997)

    Google Scholar 

  13. Hauck, S., et al.: The chimaera reconfigurable functional unit. In: Proc. FCCM (1997)

    Google Scholar 

  14. Hu, S., et al.: An Approach for Implementing Efficient Superscalar CISC Processors. In: Proc. HPCA (2006)

    Google Scholar 

  15. Hu, S., Smith, J.: Using Dynamic Binary Translation to Fuse Dependent Instructions. In: Int. Symp. on CGO (2004)

    Google Scholar 

  16. Iseli, C., Sanchez, E.: Spyder: a sure (superscalar and reconfigurable) processor. Journal of Supercomputing 9(3), 231–252 (1995)

    Article  Google Scholar 

  17. Intel Corporation, Mobile Intel Pentium 4 M-Processor Datasheet (June 2003), http://www.intel.com/design/mobile/datashts/250686.htm

  18. Jacob, J.A., Chow, P.: Memory interfacing an instruction specification for reconfigurable processors. In: Symp. FPGAs (1999)

    Google Scholar 

  19. Kim, I., Lipasti, M.: Macro-op Scheduling: Relaxing Scheduling Loop Constraints. In: Proc. MICRO (2003)

    Google Scholar 

  20. Kumar, R., et al.: Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In: Proc. ISCA (2005)

    Google Scholar 

  21. Lee, C., et al.: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proc. MICRO (1997)

    Google Scholar 

  22. Lotz, J., et al.: A Quad-Issue Out-of-Order RISC CPU. In: Proc. Int’l. Solid-State Circuits Conf. (1996)

    Google Scholar 

  23. Miyamori, T., Olukotun, K.: Remarc: Reconfigurable multimedia array co-processor. IEICE Trans. on information and systems E82-D(2), 389–397 (1999)

    Google Scholar 

  24. Olukotun, K., et al.: The Case for a Single-Chip Multiprocessor. In: ASPLOS (1996)

    Google Scholar 

  25. Sun Microsystems, Inc. OpenSPARC T1 Micro Architecture Specification, Sun Microsystems, Inc. (2006)

    Google Scholar 

  26. Razdan, R., Smith, M.: A high-performance microarchitecture with hardware-programmable functional units. In: Proc. MICRO (1994)

    Google Scholar 

  27. Rotenberg, E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: Proc. MICRO (1996)

    Google Scholar 

  28. Rupp, C.R., et al.: The napa adaptive processing architecture. In: Proc. FCCM (1998)

    Google Scholar 

  29. Sassone, P., Wills, D.: Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication. In: Proc. MICRO (2004)

    Google Scholar 

  30. Singh, H., et al.: Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. on Computers 49(5), 465–481 (2000)

    Article  Google Scholar 

  31. TSMC 90nm Core Library - TCBN90GHP, App. Note - Revision 1.2 (2006)

    Google Scholar 

  32. Vassiliadis, S., et al.: The molen polymorphic processor. IEEE Trans. on Computers 53(11) (2004)

    Google Scholar 

  33. Wittig, R., Chow, P.: Onechip: An fpga processor with reconfigurable logic. In: Proc. FCCM (1996)

    Google Scholar 

  34. Wong, S., et al.: Coarse reconfigurable multimedia unit extension. In: Proc. 9th Euromicro workshop on Parallel and Distributed Processing (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Suri, T., Aggarwal, A. (2008). Scalable Multi-cores with Improved Per-core Performance Using Off-the-critical Path Reconfigurable Hardware. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89894-8_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89893-1

  • Online ISBN: 978-3-540-89894-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics