Evaluation of a Floating-Point Intensive Kernel on FPGA

A Case Study of Geodesic Distance Kernel
  • Zheming Jin
  • Hal Finkel
  • Kazutomo Yoshii
  • Franck Cappello
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10659)


Heterogeneous platforms provide a promising solution for high-performance and energy-efficient computing applications. This paper presents our research on usage of heterogeneous platform for a floating-point intensive kernel. We first introduce the floating-point intensive kernel from the geographical information system. Then we analyze the FPGA designs generated by the Intel FPGA SDK for OpenCL, and evaluate the kernel performance and the floating-point error rate of the FPGA designs. Finally, we compare the performance and energy efficiency of the kernel implementations on the Arria 10 FPGA, Intel’s Xeon Phi Knights Landing CPU, and NVIDIA’s Kepler GPU. Our evaluation shows the energy efficiency of the single-precision kernel on the FPGA is 1.35X better than on the CPU and the GPU, while the energy efficiency of the double-precision kernel on the FPGA is 1.36X and 1.72X less than the CPU and GPU, respectively.


HPC FPGA Floating-point operation OpenCL 



We thank the anonymous reviewers and the shepherd for their comments. This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.


  1. 1.
    Koch, D., Hannig, F., Ziener, D. (eds.): FPGAs for Software Programmers. Springer, Cham (2016). Google Scholar
  2. 2.
    Intel FPGA SDK for OpenCL Cyclone V SoC Getting Started Guide. Intel (2017)Google Scholar
  3. 3.
    Intel FPGA SDK for OpenCL Stratix V Network Reference Platform Porting Guide. Intel (2017)Google Scholar
  4. 4.
    Intel FPGA SDK for OpenCL Arria 10 GX FPGA Development Kit Reference Platform Porting Guide. Intel (2017)Google Scholar
  5. 5.
    Wirbel, L.: Xilinx SDAccel Whitepaper. Xilinx (2014)Google Scholar
  6. 6.
    Chen, D., Singh, D.: Fractal video compression in OpenCL: an evaluation of CPUs, GPUs, and FPGAs as acceleration platforms. In: Proceedings of 18th Asia and South Pacific Design Automation Conference, pp. 297–304 (2013)Google Scholar
  7. 7.
    Fifield, J., et al.: Optimizing OpenCL applications on Xilinx FPGA. In: Proceedings of 4th International Workshop on OpenCL. ACM, New York (2016)Google Scholar
  8. 8.
    Zohouri, H.R., et al.: Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. In: International Conference for High Performance Computing, Networking, Storage and Analysis, Salt Lake City, UT, pp. 409–420 (2016)Google Scholar
  9. 9.
    Inggs, G., et al.: Is high level synthesis ready for business? A computational finance case study. In: 2014 International Conference on Field-Programmable Technology (FPT), Shanghai, pp. 12–19 (2014)Google Scholar
  10. 10.
    Underwood, K.: FPGAs vs. CPUs: trends in peak floating-point performance. In: Proceedings of 12th ACM International Symposium on Field-Programmable Gate Arrays, pp. 171–180. ACM Press (2004)Google Scholar
  11. 11.
    Véstias, M., Neto, H.: Trends of CPU GPU and FPGA for high-performance computing. In: 2014 24th International Conference on Field Programmable Logic and Applications, pp. 1–6 (2014)Google Scholar
  12. 12.
    Govindu, G., et al.: Area and power performance analysis of floating-point-based application on FPGAs. In: Proceedings of 7th Annual Workshop High-Performance Embedded Computing, USA (2003)Google Scholar
  13. 13.
    Che, S., et al.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Symposium on Application Specific Processors, USA, pp. 101–107 (2008)Google Scholar
  14. 14.
    Ndu, G., et al.: CHO: towards a benchmark suite for OpenCL FPGA accelerators. In: 3rd IWOCL International Workshop on OpenCL, California, USA (2015)Google Scholar
  15. 15.
    Taking Advantage of Advances in FPGA Floating-Point IP Cores. Altera (2009)Google Scholar
  16. 16.
    Enabling High-Performance Floating-Point Designs. Intel (2016)Google Scholar
  17. 17.
    Jin, Z., et al.: Evaluation of CHO benchmarks on the Arria 10 FPGA using the Intel FPGA SDK for OpenCL. Argonne Leadership Computing Facility, Argonne National Laboratory, ANL/ALCF-17/4 (2017)Google Scholar
  18. 18.
    Leeser, M., et al.: OpenCL floating point software on heterogeneous architectures–portable or not. In: Workshop on Numerical Software Verification (NSV) (2012)Google Scholar
  19. 19.
  20. 20.
    GpsDrive Homepage:
  21. 21.
  22. 22.
    Intel FPGA SDK for OpenCL Programming Guide. UG-OCL002. Intel (2016)Google Scholar
  23. 23.
    Arria 10 Native Floating-Point DSP IP Core User Guide. Intel (2016)Google Scholar
  24. 24.
    Jeffers, J., et al.: Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann Publishers, San Francisco (2016)Google Scholar
  25. 25.
  26. 26.
    CUDA C Programming Guide. NVIDIA (2017)Google Scholar
  27. 27.
    Leveraging the Intel HyperFlex FPGA Architecture in Intel Stratix 10 Devices to Achieve Maximum Power Reduction. Intel (2016)Google Scholar
  28. 28.
    Stratix 10 GX/SX Device Overview. Intel (2016)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Zheming Jin
    • 1
  • Hal Finkel
    • 1
  • Kazutomo Yoshii
    • 1
  • Franck Cappello
    • 1
  1. 1.Argonne National LaboratoryArgonneUSA

Personalised recommendations