Abstract
In High-Performance Computing (HPC), Field Programmable Gate Array (FPGA) is attracting increased attention as an accelerator because its performance has been dramatically improved in recent years. On the other hand, task-based programming recently supported in OpenMP 4.0 enables to expose much parallelism by executing several tasks of the program in the form of a task graph. To accelerate the task-based parallel program by FPGA, it is useful for some dominant tasks frequently executed in parallel to be offloaded to FPGA as an asynchronous FPGA task. We present a performance optimization based on the trade-off between the kernel size and the number of asynchronously executed kernels in parallel in OpenMP task-based programming with FPGA tasks to make use of FPGA hardware resources efficiently. Since a “program” for FPGA is directly converted into the hardware, the hardware resource limitation raises a new issue in optimization on which and how to offload a task to FPGA. Taking task-based block Cholesky factorization as a motivating example, we present the trade-off on how to offload dominant “GEMM” task frequently executed in parallel in the execution of the task-graph. We found that under the limitation of the hardware resource, multiple small kernels are better than a single big high-performance kernel because of higher throughput and higher kernel frequency.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lee, S., Kim, J., Vetter, J.S.: OpenACC to FPGA: a framework for directive-based high-performance reconfigurable computing. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 544–554. IEEE (2016)
OpenMP. http://www.openmp.org/
The OmpSs Programming Model. https://pm.bsc.es/ompss
OpenCL Overview. https://www.khronos.org/opencl/
Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html
Zohouri, H.R., Maruyama, N., Smith, A., Matsuda, M., Matsuoka, S.: Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 35. IEEE Press (2016)
Kobayashi, R., Oobata, Y., Fujita, N., Yamaguchi, Y., Boku, T.: OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192–201. ACM (2018)
Argobots. http://www.argobots.org/
A10PL4 PCIe FPGA Board. https://www.bittware.com/fpga/intel/boards/a10pl4/
Arria 10 FPGA. https://www.altera.com/products/fpga/arria-series/arria-10/overview.html
Lee, J., Petrogalli, F., Hunter, G., Sato, M.: Extending OpenMP SIMD support for target specific code and application to ARM SVE. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 62–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_5
Open Accelerator Research Compiler. http://ft.ornl.gov/research/openarc
Filgueras, A., et al.: OmpSs@Zynq all-programmable SoC ecosystem. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 137–146. ACM (2014)
Bosch, J., Filgueras, A., Vidal, M., Jimenez-Gonzalez, D., Alvarez, C., Martorell, X.: Exploiting parallelism on GPUs and FPGAs with OmpSs. In: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy Efficient HPC Systems, p. 4. ACM (2017)
Intel FPGA SDK for OpenCL Programming Guide. https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf
Intel FPGA SDK for OpenCL Best Practices Guide. https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Watanabe, Y., Lee, J., Boku, T., Sato, M. (2018). Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-98521-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)