FPGA Overlays

  • Hayden Kwok-Hay SoEmail author
  • Cheng Liu


Developing applications that run on FPGAs is without doubt a very different experience from writing programs in software. Not only is the hardware design process fundamentally different from that of software development, software programmers also often find themselves constantly battling with the much lower design productivity in developing hardware designs. In this chapter, we explore how the concept of FPGA overlay may be able to alleviate some of these burdens. We will look at how by using an overlay architecture, designers are able to compile applications to FPGA hardware in merely seconds instead of hours. We will also look at how overlays are able to help with design portability, as well as to improve debugging capabilities of low-level designs. Finally, we will explore the challenges and opportunities for future research in this area.


Area Overhead Hardware Accelerator Data Flow Graph Configurable Fabric FPGA Resource 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. [BKT11]
    C. Beckhoff, D. Koch, and J. Torresen. The Xilinx design language (XDL): Tutorial and use cases. In Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2011 6th International Workshop on, pages 1–8. IEEE, 2011.Google Scholar
  2. [BL12]
    A. Brant and G. Lemieux. ZUMA: An open FPGA overlay architecture. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on, pages 93–96, 2012.Google Scholar
  3. [BTA93]
    J. Babb, R. Tessier, and A. Agarwal. Virtual wires: Overcoming pin limitations in FPGA-based logic emulators. In FPGAs for Custom Computing Machines, 1993. Proceedings. IEEE Workshop on, pages 142–151, Apr 1993.Google Scholar
  4. [CA13]
    D. Capalija and T. Abdelrahman. A high-performance overlay architecture for pipelined execution of data flow graphs. In Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on, pages 1–8, Sept 2013.Google Scholar
  5. [CH02]
    K. Compton and S. Hauck. Reconfigurable computing: A survey of systems and software. ACM Comput. Surv., 34(2):171–210, June 2002.CrossRefGoogle Scholar
  6. [CS10]
    J. Coole and G. Stitt. Intermediate fabrics: Virtual architectures for circuit portability and fast placement and routing. In Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2010 IEEE/ACM/IFIP International Conference on, pages 13–22, Oct 2010.Google Scholar
  7. [CS15]
    J. Coole and G. Stitt. Adjustable-cost overlays for runtime compilation. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on, pages 21–24, May 2015.Google Scholar
  8. [DMC+06]
    A. DeHon, Y. Markovsky, E. Caspi, M. Chu, R. Huang, S. Perissakis, L. Pozzi, J. Yeh, and J. Wawrzynek. Stream computations organized for reconfigurable execution. Journal of Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), 30(6):334–354, 2006.Google Scholar
  9. [FC05]
    W. Fu and K. Compton. An execution environment for reconfigurable computing. In Field-Programmable Custom Computing Machines, 2005. FCCM 2005. 13th Annual IEEE Symposium on, pages 149 – 158, april 2005.Google Scholar
  10. [FVM+11]
    R. Ferreira, J. Vendramini, L. Mucida, M. Pereira, and L. Carro. An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture. In Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems, pages 195–204. ACM, 2011.Google Scholar
  11. [GWL11]
    D. Grant, C. Wang, and G. G. Lemieux. A CAD framework for Malibu: An FPGA with time-multiplexed coarse-grained elements. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’11, pages 123–132, New York, NY, USA, 2011. ACM.Google Scholar
  12. [HIS14]
    B. Hamilton, M. Inggs, and H.-H. So. Mixed-architecture process scheduling on tightly coupled reconfigurable computers. In Field Programmable Logic and Applications (FPL), 2014 24th International Conference on, pages 1–4, Sept 2014.Google Scholar
  13. [HW13]
    E. Hung and S. J. Wilton. Towards simulator-like observability for FPGAs: A virtual overlay network for trace-buffers. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA ’13, pages 19–28, New York, NY, USA, 2013. ACM.Google Scholar
  14. [HW14]
    E. Hung and S. Wilton. Incremental trace-buffer insertion for FPGA debug. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, 22(4):850–863, April 2014.Google Scholar
  15. [JFM15]
    A. K. Jain, S. A. Fahmy, and D. L. Maskell. Efficient overlay architecture based on dsp blocks. In Field-Programmable Custom Computing Machines (FCCM), 2015 IEEE 23rd Annual International Symposium on, pages 25–28, May 2015.Google Scholar
  16. [KBL13]
    D. Koch, C. Beckhoff, and G. Lemieux. An efficient FPGA overlay for portable custom instruction set extensions. In Field Programmable Logic and Applications (FPL), 2013 23rd International Conference on, pages 1–8, Sept 2013.Google Scholar
  17. [KHKT06]
    D. Kissler, F. Hannig, A. Kupriyanov, and J. Teich. A dynamically reconfigurable weakly programmable processor array architecture template. In ReCoSoC, pages 31–37, 2006.Google Scholar
  18. [KS11]
    J. Kingyens and J. G. Steffan. The potential for a GPU-Like overlay architecture for FPGAs. Int. J. Reconfig. Comp., 2011, 2011.Google Scholar
  19. [LCD+10]
    I. Lebedev, S. Cheng, A. Doupnik, J. Martin, C. Fletcher, D. Burke, M. Lin, and J. Wawrzynek. MARC: A many-core approach to reconfigurable computing. In Reconfigurable Computing and FPGAs (ReConFig), 2010 International Conference on, pages 7 –12, dec. 2010.Google Scholar
  20. [LMVV05]
    R. Lysecky, K. Miller, F. Vahid, and K. Vissers. Firm-core virtual FPGA for just-in-time FPGA compilation. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays, FPGA ’05, pages 271–271, New York, NY, USA, 2005. ACM.Google Scholar
  21. [LP09b]
    E. Lübbers and M. Platzner. ReconOS: Multithreaded programming for reconfigurable computers. ACM Trans. Embed. Comput. Syst., 9(1):8:1–8:33, October 2009.Google Scholar
  22. [LPL+11]
    C. Lavin, M. Padilla, J. Lamprecht, P. Lundrigan, B. Nelson, and B. Hutchings. HMFlow: Accelerating FPGA compilation with hard macros for rapid prototyping. In Field-Programmable Custom Computing Machines (FCCM), 2011 IEEE 19th Annual International Symposium on, pages 117 –124, may 2011.Google Scholar
  23. [LS12]
    C. Y. Lin and H. K.-H. So. Energy-efficient dataflow computations on FPGAs using application-specific coarse-grain architecture synthesis. SIGARCH Comput. Archit. News, 40(5):58–63, March 2012.CrossRefGoogle Scholar
  24. [LS15]
    C. Liu and H. K.-H. So. Automatic nested loop acceleration on FPGAs using soft CGRA overlay. In FPGAs for Software Programmers (FSP), Second International Workshop on, Sept 2015.Google Scholar
  25. [SB08]
    H. K.-H. So and R. Brodersen. A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH. Transactions on Embedded Computing Systems, 7(2):1–28, 2008.CrossRefGoogle Scholar
  26. [SBB06]
    S. Shukla, N. Bergmann, and J. Becker. QUKU: A two-level reconfigurable architecture. In Emerging VLSI Technologies and Architectures, 2006. IEEE Computer Society Annual Symposium on, March 2006.Google Scholar
  27. [Sch96]
    J. Schutten. List scheduling revisited. Operations Research Letters, 18(4):167–170, 1996.CrossRefzbMATHGoogle Scholar
  28. [SL12]
    A. Severance and G. Lemieux. VENICE: A compact vector processor for FPGA applications. In Field-Programmable Technology (FPT), 2012 International Conference on, pages 261–268, Dec 2012.Google Scholar
  29. [TB01]
    R. Tessier and W. Burleson. Reconfigurable computing for digital signal processing: A survey. Journal of VLSI signal processing systems for signal, image and video technology, 28(1-2):7–27, 2001.CrossRefzbMATHGoogle Scholar
  30. [TCJW97]
    S. Trimberger, D. Carberry, A. Johnson, and J. Wong. A time-multiplexed FPGA. In FCCM ’97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines, page 22, Washington, DC, USA, 1997. IEEE Computer Society.Google Scholar
  31. [Xil11]
    Xilinx Inc. Data2MEM User Guide, Oct 2011. [Online; accessed 19-September-2012].
  32. [YSR09]
    P. Yiannacouras, J. G. Steffan, and J. Rose. Fine-grain performance scaling of soft vector processors. In Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, CASES ’09, pages 97–106, New York, NY, USA, 2009. ACM.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of Electrical and Electronic EngineeringThe University of Hong KongpokfulamHong Kong

Personalised recommendations