Architecture for Transparent Binary Acceleration of Loops with Memory Accesses
This paper presents an extension to a hardware/software system architecture in which repetitive instruction traces, called Megablocks, Reconfigurable Processing Unit (RPU). This scheme is supported by a custom toolchain able to automatically generate a RPU tailored for the execution of one or more Megablocks detected offline. Switching between hardware and software execution is done transparently, without modifications to source code or executable binaries. Our approach has been evaluated using an architecture with a MicroBlaze General Purpose Processor (GPP) softcore. By using a memory sharing mechanism, the RPU can access the GPP’s data memory, allowing the acceleration of Megablocks with load/store operations. For a set of 21 embedded benchmarks, an average speedup of 1.43× is achieved, and a potential speedup of 2.09× is predicted for an implementation using a low overhead interface for communication between GPP and RPU.
Keywordsreconfigurable processor memory access Megablock instruction trace MicroBlaze hardware acceleration FPGA
Unable to display preview. Download preview PDF.
- 2.Clark, N., Blome, J., Chu, M., Mahlke, S., Biles, S., Flautner, K.: An architecture framework for transparent instruction set customization in embedded processors. In: Proc. of the 32nd Annual Intl. Symposium on Computer Arch. (ISCA 2005), pp. 272–283. IEEE Computer Society, Washington, DC (May 2005)Google Scholar
- 6.Kim, Y., Lee, J., Shrivastava, A., Paek, Y.: Memory access optimization in compilation for coarse-grained reconfigurable architectures. ACM Trans. Des. Autom. Electron. Syst. 16(4), 42:1–42:27 (2011)Google Scholar
- 7.Beck, A.C.S., Rutzig, M.B., Gaydadjiev, G., Carro, L.: Transparent reconfigurable acceleration for heterogeneous embedded applications. In: Proc. of the Conf. on Design, Automation and Test in Europe (DATE 2008), pp. 1208–1213. ACM (2008)Google Scholar
- 8.Bispo, J., Paulino, N., Cardoso, J.M., Ferreira, J.C.: Transparent runtime migration of loop-based traces of processor instructions to reconfigurable processing units. International Journal of Reconfigurable Computing (2012) (in press)Google Scholar
- 9.Bispo, J., Cardoso, J.M.P.: On identifying and optimizing instruction sequences for dynamic compilation. In: Proc. Intl. Conf. Field-Programmable Technology (FPT 2010), pp. 437–440 (2010)Google Scholar
- 10.Seoul National University: SNU Real-Time Benchmarks, http://www.cprover.org/goto-cc/examples/snu.html (accessed December 23, 2012)
- 11.Texas Instruments: TMS320C6000 Image Library (IMGLIB) - SPRC264, http://www.ti.com/tool/sprc264 (accessed December 23, 2012)
- 12.Warren, H.S.: Hacker’s Delight. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)Google Scholar