Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming

Watanabe, Yutaka; Lee, Jinpil; Boku, Taisuke; Sato, Mitsuhisa

doi:10.1007/978-3-319-98521-3_7

Yutaka Watanabe¹⁸,
Jinpil Lee¹⁹,
Taisuke Boku^18,20 &
…
Mitsuhisa Sato^18,19

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11128))

Included in the following conference series:

International Workshop on OpenMP

789 Accesses
2 Citations

Abstract

In High-Performance Computing (HPC), Field Programmable Gate Array (FPGA) is attracting increased attention as an accelerator because its performance has been dramatically improved in recent years. On the other hand, task-based programming recently supported in OpenMP 4.0 enables to expose much parallelism by executing several tasks of the program in the form of a task graph. To accelerate the task-based parallel program by FPGA, it is useful for some dominant tasks frequently executed in parallel to be offloaded to FPGA as an asynchronous FPGA task. We present a performance optimization based on the trade-off between the kernel size and the number of asynchronously executed kernels in parallel in OpenMP task-based programming with FPGA tasks to make use of FPGA hardware resources efficiently. Since a “program” for FPGA is directly converted into the hardware, the hardware resource limitation raises a new issue in optimization on which and how to offload a task to FPGA. Taking task-based block Cholesky factorization as a motivating example, we present the trade-off on how to offload dominant “GEMM” task frequently executed in parallel in the execution of the task-graph. We found that under the limitation of the hardware resource, multiple small kernels are better than a single big high-performance kernel because of higher throughput and higher kernel frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lee, S., Kim, J., Vetter, J.S.: OpenACC to FPGA: a framework for directive-based high-performance reconfigurable computing. In: 2016 IEEE International Parallel and Distributed Processing Symposium, pp. 544–554. IEEE (2016)
Google Scholar
OpenMP. http://www.openmp.org/
The OmpSs Programming Model. https://pm.bsc.es/ompss
OpenCL Overview. https://www.khronos.org/opencl/
Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html
Zohouri, H.R., Maruyama, N., Smith, A., Matsuda, M., Matsuoka, S.: Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, p. 35. IEEE Press (2016)
Google Scholar
Kobayashi, R., Oobata, Y., Fujita, N., Yamaguchi, Y., Boku, T.: OpenCL-ready high speed FPGA network for reconfigurable high performance computing. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, pp. 192–201. ACM (2018)
Google Scholar
Argobots. http://www.argobots.org/
A10PL4 PCIe FPGA Board. https://www.bittware.com/fpga/intel/boards/a10pl4/
Arria 10 FPGA. https://www.altera.com/products/fpga/arria-series/arria-10/overview.html
Lee, J., Petrogalli, F., Hunter, G., Sato, M.: Extending OpenMP SIMD support for target specific code and application to ARM SVE. In: de Supinski, B.R., Olivier, S.L., Terboven, C., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2017. LNCS, vol. 10468, pp. 62–74. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65578-9_5
Chapter Google Scholar
Open Accelerator Research Compiler. http://ft.ornl.gov/research/openarc
Filgueras, A., et al.: OmpSs@Zynq all-programmable SoC ecosystem. In: Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 137–146. ACM (2014)
Google Scholar
Bosch, J., Filgueras, A., Vidal, M., Jimenez-Gonzalez, D., Alvarez, C., Martorell, X.: Exploiting parallelism on GPUs and FPGAs with OmpSs. In: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy Efficient HPC Systems, p. 4. ACM (2017)
Google Scholar
Intel FPGA SDK for OpenCL Programming Guide. https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl_programming_guide.pdf
Intel FPGA SDK for OpenCL Best Practices Guide. https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf

Download references

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan
Yutaka Watanabe, Taisuke Boku & Mitsuhisa Sato
RIKEN Center for Computational Science, Kobe, Hyogo, Japan
Jinpil Lee & Mitsuhisa Sato
Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
Taisuke Boku

Authors

Yutaka Watanabe
View author publications
You can also search for this author in PubMed Google Scholar
Jinpil Lee
View author publications
You can also search for this author in PubMed Google Scholar
Taisuke Boku
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuhisa Sato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yutaka Watanabe .

Editor information

Editors and Affiliations

Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Pedro Valero-Lara
Universitat Politècnica de Catalunya, Barcelona, Spain
Xavier Martorell
Barcelona Supercomputing Center, Barcelona, Barcelona, Spain
Sergi Mateo Bellido
Universitat Politècnica de Catalunya, Barcelona, Barcelona, Spain
Jesus Labarta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Watanabe, Y., Lee, J., Boku, T., Sato, M. (2018). Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming. In: de Supinski, B., Valero-Lara, P., Martorell, X., Mateo Bellido, S., Labarta, J. (eds) Evolving OpenMP for Evolving Architectures. IWOMP 2018. Lecture Notes in Computer Science(), vol 11128. Springer, Cham. https://doi.org/10.1007/978-3-319-98521-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-98521-3_7
Published: 29 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98520-6
Online ISBN: 978-3-319-98521-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming