The Journal of Supercomputing

, Volume 75, Issue 3, pp 1732–1746 | Cite as

Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL

  • María Angélica Dávila GuzmánEmail author
  • Raúl Nozal
  • Rubén Gran Tejero
  • María Villarroya-Gaudó
  • Darío Suárez Gracia
  • Jose Luis Bosque


Heterogeneous systems are the core architecture of most of the high-performance computing nodes, due to their excellent performance and energy efficiency. However, a key challenge that remains is programmability, specifically, releasing the programmer from the burden of managing data and devices with different architectures. To this end, we extend EngineCL to support FPGA devices. Based on OpenCL, EngineCL is a high-level framework providing load balancing among devices. Our proposal fully integrates FPGAs into the framework, enabling effective cooperation between CPU, GPU, and FPGA. With command overlapping and judicious data management, our work improves performance by up to 96% compared with single-device execution and delivers energy-delay gains of up to 37%. In addition, adopting FPGAs does not require programmers to make big changes in their applications because the extensions do not modify the user-facing interface of EngineCL.


Heterogeneous scheduling FPGA Load balancing OpenCL 



The authors would like to thank the anonymous reviewers and Shaizeen Aga for their suggestions, Luis Piñuel Moreno for his help measuring FPGA power, and NVIDIA and Intel for their generous hardware and software donations. This work was supported in part by Grants TIN2016-76635-C2 (AEI/FEDER, UE), gaZ: T48 research group (Aragón Gov. and European ESF), the University of Zaragoza (JIUZ-2017-TEC-09), HiPEAC4 (European H2020/687698), the Spanish Ministry of Education (FPU16/03299), and the CAPAP-H Network Grant TIN2016-81840-REDT. M. A. Dávila Guzmán is supported by a Universidad de Zaragoza-Banco Santander Ph.D. scholarship.


  1. 1.
  2. 2.
    Alawieh M et al (2015) A high performance FPGA–GPU–CPU platform for a real-time locating system. In: EUSIPCO, pp 1576–1580Google Scholar
  3. 3.
    Belviranli ME et al (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):57:1–57:20CrossRefGoogle Scholar
  4. 4.
    Binotto AlPD et al (2010) Towards dynamic reconfigurable load-balancing for hybrid desktop platforms. In: IPDPSWGoogle Scholar
  5. 5.
    Chen X et al (2017) CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment. In: BMC BioinformaticsGoogle Scholar
  6. 6.
    Chung ES et al (2010) Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs? In: Proceedings of the 43rd Annual International Symposium on Microarchitecture, MICRO ’43. IEEE Computer Society, Washington, pp 225–236Google Scholar
  7. 7.
    Gaster B, Howes L, Kaeli DR, Mistry P, Schaa D (2011) Heterogeneous computing with OpenCL, 1st edn. Morgan Kaufmann Publishers Inc., San FranciscoGoogle Scholar
  8. 8.
    Horowitz M (2014) 1.1 computing’s energy problem (and what we can do about it). In: ISSCC, pp 10–14Google Scholar
  9. 9.
    Igual FD, Jara LM, Pérez JIG, Piñuel L, Prieto-Matías M (2015) A power measurement environment for PCIe accelerators. Comput Sci R&D 30(2):115–124Google Scholar
  10. 10.
    Kaleem R et al (2014) Adaptive heterogeneous scheduling for integrated GPUs. In: PACT. ACM, New York, pp 151–162Google Scholar
  11. 11.
    Katranovet A et al (2016) Intel threading building block (TBB) flow graph as a software infrastructure layer for OpenCL-based computations. In: ACM IWOCL, pp 9:1–9:3Google Scholar
  12. 12.
    Koch D et al (eds) (2016) FPGAs for software programmers. Springer, ChamGoogle Scholar
  13. 13.
    Lee J et al (2016) Orchestrating multiple data-parallel kernels on multiple devices. In: International Conference on Parallel Architectures and Compilation Techniques, pp 355–366Google Scholar
  14. 14.
    Luk C-K et al (2009) Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. IEEE/ACM Micro-42 p 45Google Scholar
  15. 15.
    Mittal Sa (2015) A survey of CPU–GPU heterogeneous computing techniques. ACM Comput Surv 47(4):1–35CrossRefGoogle Scholar
  16. 16.
    Momeni A et al (2016) Hardware thread reordering to boost OpenCL throughput on FPGAs. In: ICCD, pp 257–264Google Scholar
  17. 17.
    Muslim FB et al (2017) Efficient FPGA implementation of Opencl high-performance computing applications via high-level synthesis. IEEE Access 5:2747–2762CrossRefGoogle Scholar
  18. 18.
    Nane R et al (2016) A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans Comput Aided Des Integr Circuits Syst 35(10):1591–1604CrossRefGoogle Scholar
  19. 19.
    Nozal R et al (2018) EngineCL: usability and performance in heterogeneous computing. arXiv: abs/1805.02755Google Scholar
  20. 20.
    Nozal R et al (2018) Load balancing in a heterogeneous world: Cpu-Xeon Phi co-execution of data-parallel kernels. J Supercomput 73(1):330–342Google Scholar
  21. 21.
    Nunez-Yanez J (2018) Simultaneous multiprocessing in a software-defined heterogeneous FPGA. J SupercomputGoogle Scholar
  22. 22.
    Pandit P et al (2014) Fluidic kernels: cooperative execution of OpenCL programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and OptimizationGoogle Scholar
  23. 23.
    Pérez B (2017) Energy efficiency of load balancing for data-parallel applications in heterogeneous systems. J Supercomput 73(1):330–342CrossRefGoogle Scholar
  24. 24.
    Pérez B et al (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: GPGPU. ACM, New York, pp. 42–51Google Scholar
  25. 25.
    Qualcomm Snapdragon Heterogeneous Compute SDK (2018).
  26. 26.
    Rethinagiri SK et al (2015) Trigeneous platforms for energy efficient computing of HPC applications. In: International Conference on High Performance Computing Trigeneous. IEEEGoogle Scholar
  27. 27.
  28. 28.
    Tsoi KH et al (2010) Axel: a heterogeneous cluster with FPGAs and GPUs. In: ACM/SIGDA FPGA. ACM, New York, pp 115–124Google Scholar
  29. 29.
    Vilches A et al (2015) Adaptive partitioning for irregular applications on heterogeneous CPU–GPU chips. Procedia Comput Sci ICCS 51:140–149CrossRefGoogle Scholar
  30. 30.
    Wang Z et al (2016) A performance analysis framework for optimizing OpenCL applications on FPGAs. In: Proceedings of HPCA, pp 114–125Google Scholar
  31. 31.
    Zhou S et al (2017) Accelerating graph analytics on CPU-FPGA heterogeneous platform. In: SBAC-PAD, pp 137–144Google Scholar
  32. 32.
    Zohouri HR et al (2016) Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. In: SC. IEEE Press, Piscataway, pp 35:1–35:12Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Universidad de ZaragozaZaragozaSpain
  2. 2.Universidad de CantabriaSantanderSpain

Personalised recommendations