Abstract
Scaling the number of cores in a multi-core processor constraints the resources available in each core, resulting in reduced per-core performance. Alternatively, the number of cores have to be reduced in order to improve per-core performance. In this paper, we propose a technique to improve the per-core performance in a many-core processor without reducing the number of cores. In particular, we integrate a Reconfigurable Hardware Unit (RHU) in each core. The RHU executes the frequently encountered instructions to increase the core’s overall execution bandwidth, thus improving its performance. We also propose a novel integrated hardware/software methodology for efficient RHU reconfiguration. The RHU has low area overhead, and hence has minimal impact on the scalability of the multi-core. Our experiments show that the proposed architecture improves the per-core performance by an average of about 12% across a wide range of applications, while incurring a per-core area overhead of only about 5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Athanas, P., et al.: Processor reconfiguration through instruction-set metamorphosis. IEEE Computer 26(3) (1995)
Bracy, A., et al.: Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth. In: Proc. MICRO (2004)
Bracy, A., et al.: Serialization-Aware Mini-Graphs: Performance with Fewer Resources. In: Proc. MICRO (2006)
Callahan, T., et al.: The garp architecture and c compiler. IEEE Computer 33(4), 62–69 (2000)
Chou, Y., et al.: Piperench implementation of the instruction path coprocessor. In: Proc. MICRO (2000)
Clark, N., et al.: An architecture framework for transparent instruction set customization in embedded processors. In: Proc. ISCA (2005)
Clark, N., et al.: Processor acceleration through automated instruction-set customization. In: Proc. MICRO (2003)
Clark, N., et al.: Application Specific Processing on a General Purpose Core via Transparent Instruction Set Customization. In: Proc. MICRO (2004)
Corliss, M.L., et al.: DISE: A Programmable Macro Engine for Customizing Applications. In: Proc. ISCA (2003)
Fahs, B., et al.: Performance characterization of a hardware mechanism for dynamic optimization. In: Proc. MICRO (2001)
Guthaus, M.R., et al.: MiBench: A free, commercially representative embedded benchmark suite. Work. Workload Characterization (2001)
Hammond, L., et al.: A Single-Chip Multiprocessor. IEEE Computer 30(9) (September 1997)
Hauck, S., et al.: The chimaera reconfigurable functional unit. In: Proc. FCCM (1997)
Hu, S., et al.: An Approach for Implementing Efficient Superscalar CISC Processors. In: Proc. HPCA (2006)
Hu, S., Smith, J.: Using Dynamic Binary Translation to Fuse Dependent Instructions. In: Int. Symp. on CGO (2004)
Iseli, C., Sanchez, E.: Spyder: a sure (superscalar and reconfigurable) processor. Journal of Supercomputing 9(3), 231–252 (1995)
Intel Corporation, Mobile Intel Pentium 4 M-Processor Datasheet (June 2003), http://www.intel.com/design/mobile/datashts/250686.htm
Jacob, J.A., Chow, P.: Memory interfacing an instruction specification for reconfigurable processors. In: Symp. FPGAs (1999)
Kim, I., Lipasti, M.: Macro-op Scheduling: Relaxing Scheduling Loop Constraints. In: Proc. MICRO (2003)
Kumar, R., et al.: Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling. In: Proc. ISCA (2005)
Lee, C., et al.: MediaBench: a tool for evaluating and synthesizing multimedia and communications systems. In: Proc. MICRO (1997)
Lotz, J., et al.: A Quad-Issue Out-of-Order RISC CPU. In: Proc. Int’l. Solid-State Circuits Conf. (1996)
Miyamori, T., Olukotun, K.: Remarc: Reconfigurable multimedia array co-processor. IEICE Trans. on information and systems E82-D(2), 389–397 (1999)
Olukotun, K., et al.: The Case for a Single-Chip Multiprocessor. In: ASPLOS (1996)
Sun Microsystems, Inc. OpenSPARC T1 Micro Architecture Specification, Sun Microsystems, Inc. (2006)
Razdan, R., Smith, M.: A high-performance microarchitecture with hardware-programmable functional units. In: Proc. MICRO (1994)
Rotenberg, E.: Trace cache: a low latency approach to high bandwidth instruction fetching. In: Proc. MICRO (1996)
Rupp, C.R., et al.: The napa adaptive processing architecture. In: Proc. FCCM (1998)
Sassone, P., Wills, D.: Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication. In: Proc. MICRO (2004)
Singh, H., et al.: Morphosys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Trans. on Computers 49(5), 465–481 (2000)
TSMC 90nm Core Library - TCBN90GHP, App. Note - Revision 1.2 (2006)
Vassiliadis, S., et al.: The molen polymorphic processor. IEEE Trans. on Computers 53(11) (2004)
Wittig, R., Chow, P.: Onechip: An fpga processor with reconfigurable logic. In: Proc. FCCM (1996)
Wong, S., et al.: Coarse reconfigurable multimedia unit extension. In: Proc. 9th Euromicro workshop on Parallel and Distributed Processing (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Suri, T., Aggarwal, A. (2008). Scalable Multi-cores with Improved Per-core Performance Using Off-the-critical Path Reconfigurable Hardware. In: Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V.K. (eds) High Performance Computing - HiPC 2008. HiPC 2008. Lecture Notes in Computer Science, vol 5374. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89894-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-89894-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89893-1
Online ISBN: 978-3-540-89894-8
eBook Packages: Computer ScienceComputer Science (R0)