Scalable Multi-cores with Improved Per-core Performance Using Off-the-critical Path Reconfigurable Hardware

Suri, Tameesh; Aggarwal, Aneesh

doi:10.1007/978-3-540-89894-8_33

Tameesh Suri⁵ &
Aneesh Aggarwal⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5374))

Included in the following conference series:

International Conference on High-Performance Computing

694 Accesses

Abstract

Scaling the number of cores in a multi-core processor constraints the resources available in each core, resulting in reduced per-core performance. Alternatively, the number of cores have to be reduced in order to improve per-core performance. In this paper, we propose a technique to improve the per-core performance in a many-core processor without reducing the number of cores. In particular, we integrate a Reconfigurable Hardware Unit (RHU) in each core. The RHU executes the frequently encountered instructions to increase the core’s overall execution bandwidth, thus improving its performance. We also propose a novel integrated hardware/software methodology for efficient RHU reconfiguration. The RHU has low area overhead, and hence has minimal impact on the scalability of the multi-core. Our experiments show that the proposed architecture improves the per-core performance by an average of about 12% across a wide range of applications, while incurring a per-core area overhead of only about 5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Athanas, P., et al.: Processor reconfiguration through instruction-set metamorphosis. IEEE Computer 26(3) (1995)
Google Scholar
Bracy, A., et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. In: Proc. MICRO (2004)
Google Scholar
Bracy, A., et al.: Serialization-Aware Mini-Graphs: Performance with Fewer Resources. In: Proc. MICRO (2006)
Google Scholar
Callahan, T., et al.: The garp architecture and c compiler. IEEE Computer 33(4), 62–69 (2000)
Article Google Scholar
Chou, Y., et al.: Piperench implementation of the instruction path coprocessor. In: Proc. MICRO (2000)
Google Scholar
Clark, N., et al.: An architecture framework for transparent instruction set customization in embedded processors. In: Proc. ISCA (2005)
Google Scholar
Clark, N., et al.: Processor acceleration through automated instruction-set customization. In: Proc. MICRO (2003)
Google Scholar
Clark, N., et al.: Application Specific Processing on a General Purpose Core via Transparent Instruction Set Customization. In: Proc. MICRO (2004)
Google Scholar
Corliss, M.L., et al.: DISE: A Programmable Macro Engine for Customizing Applications. In: Proc. ISCA (2003)
Google Scholar
Fahs, B., et al.: Performance characterization of a hardware mechanism for dynamic optimization. In: Proc. MICRO (2001)
Google Scholar
Guthaus, M.R., et al.: MiBench: A free, commercially representative embedded benchmark suite. Work. Workload Characterization (2001)
Google Scholar
Hammond, L., et al.: A Single-Chip Multiprocessor. IEEE Computer 30(9) (September 1997)
Google Scholar
Hauck, S., et al.: The chimaera reconfigurable functional unit. In: Proc. FCCM (1997)
Google Scholar
Hu, S., et al.: An Approach for Implementing Efficient Superscalar CISC Processors. In: Proc. HPCA (2006)
Google Scholar
Hu, S., Smith, J.: Using Dynamic Binary Translation to Fuse Dependent Instructions. In: Int. Symp. on CGO (2004)
Google Scholar
Iseli, C., Sanchez, E.: Spyder: a sure (superscalar and reconfigurable) processor. Journal of Supercomputing 9(3), 231–252 (1995)
Article Google Scholar
Intel Corporation, Mobile Intel Pentium 4 M-Processor Datasheet (June 2003), http://www.intel.com/design/mobile/datashts/250686.htm
Jacob, J.A., Chow, P.: Memory interfacing an instruction specification for reconfigurable processors. In: Symp. FPGAs (1999)
Google Scholar
Kim, I., Lipasti, M.: Macro-op Scheduling: Relaxing Scheduling Loop Constraints. In: Proc. MICRO (2003)
Google Scholar
Kumar, R., et al.: Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In: Proc. ISCA (2005)
Google Scholar
Lee, C., et al.: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proc. MICRO (1997)
Google Scholar
Lotz, J., et al.: A Quad-Issue Out-of-Order RISC CPU. In: Proc. Int’l. Solid-State Circuits Conf. (1996)
Google Scholar
Miyamori, T., Olukotun, K.: Remarc: Reconfigurable multimedia array co-processor. IEICE Trans. on information and systems E82-D(2), 389–397 (1999)
Google Scholar
Olukotun, K., et al.: The Case for a Single-Chip Multiprocessor. In: ASPLOS (1996)
Google Scholar
Sun Microsystems, Inc. OpenSPARC T1 Micro Architecture Specification, Sun Microsystems, Inc. (2006)
Google Scholar
Razdan, R., Smith, M.: A high-performance microarchitecture with hardware-programmable functional units. In: Proc. MICRO (1994)
Google Scholar
Rotenberg, E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: Proc. MICRO (1996)
Google Scholar
Rupp, C.R., et al.: The napa adaptive processing architecture. In: Proc. FCCM (1998)
Google Scholar
Sassone, P., Wills, D.: Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication. In: Proc. MICRO (2004)
Google Scholar
Singh, H., et al.: Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. on Computers 49(5), 465–481 (2000)
Article Google Scholar
TSMC 90nm Core Library - TCBN90GHP, App. Note - Revision 1.2 (2006)
Google Scholar
Vassiliadis, S., et al.: The molen polymorphic processor. IEEE Trans. on Computers 53(11) (2004)
Google Scholar
Wittig, R., Chow, P.: Onechip: An fpga processor with reconfigurable logic. In: Proc. FCCM (1996)
Google Scholar
Wong, S., et al.: Coarse reconfigurable multimedia unit extension. In: Proc. 9th Euromicro workshop on Parallel and Distributed Processing (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, State University of New York at Binghamton, Binghamton, NY, 13902, USA
Tameesh Suri & Aneesh Aggarwal

Authors

Tameesh Suri
View author publications
You can also search for this author in PubMed Google Scholar
Aneesh Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, OH 43210, Columbus, USA
Ponnuswamy Sadayappan
Department of Electrical and Computer Engineering, Rutgers, the State University of New Jersey, 94 Brett Road, NJ 08854, Piscataway, USA
Manish Parashar
Hewlett-Packard ISO,, Sy 192, Whitefield Road, Mahadevapura Post, 560048, Bangalore, India
Ramamurthy Badrinath
Department of Electrical Engineering, University of Southern California, CA 90089-2562, Los Angeles, USA
Viktor K. Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suri, T., Aggarwal, A. (2008). Scalable Multi-cores with Improved Per-core Performance Using Off-the-critical Path Reconfigurable Hardware. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-89894-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89893-1
Online ISBN: 978-3-540-89894-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics