GPU-Accelerated Language and Communication Support by FPGA

Boku, Taisuke; Hanawa, Toshihiro; Murai, Hitoshi; Nakao, Masahiro; Miki, Yohei; Amano, Hideharu; Umemura, Masayuki

doi:10.1007/978-981-13-1924-2_15

Taisuke Boku²,
Toshihiro Hanawa³,
Hitoshi Murai⁴,
Masahiro Nakao⁴,
Yohei Miki³,
Hideharu Amano⁵ &
…
Masayuki Umemura²

484 Accesses

Abstract

Although the GPU is one of the most successfully used accelerating devices for HPC, there are several issues when it is used for large-scale parallel systems. To describe real applications on GPU-ready parallel systems, we need to combine different paradigms of programming such as CUDA/OpenCL, MPI, and OpenMP for advanced platforms. In the hardware configuration, inter-GPU communication through PCIe channel and support by CPU are required which causes large overhead to be a bottleneck of total parallel processing performance. In our project to be described in this chapter, we developed an FPGA-based platform to reduce the latency of inter-GPU communication and also a PGAS language for distributed-memory programming with accelerating devices such as GPU. Through this work, a new approach to compensate the hardware and software weakness of parallel GPU computing is provided. Moreover, FPGA technology for computation and communication acceleration is described upon astrophysical problem where GPU or CPU computation is not sufficient on performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Before we started this research, we had made another PCIe base communication. Then it is named as the second version.

References

Aarseth, S.J.: Dynamical evolution of clusters of galaxies, I. Mon. Not. R. Astron. Soc. 126, 223 (1963). https://doi.org/10.1093/mnras/126.3.223
Article Google Scholar
Barnes, J., Hut, P.: A hierarchical O(N log N) force-calculation algorithm. Nature 324, 446–449 (1986). https://doi.org/10.1038/324446a0
Article Google Scholar
Cunningham, D., et al.: GPU programming in a high level language: compiling X10 to CUDA. In: Proceedings of the 2011 ACM SIGPLAN X10 workshop (X10 ’11), New York (2011)
Google Scholar
Edwards, H.C., Trott, C.R.: Kokkos: enabling performance portability across manycore architectures. In: Proceedings of the 2013 extreme scaling workshop (XSW 2013), pp. 18–24, Aug 2013
Google Scholar
Garland, M., Kudlur, M., Zheng, Y.: Designing a unified programming model for heterogeneous machines. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’12, pp. 67:1–67:11, Los Alamitos (2012)
Google Scholar
Hanawa, T., Kodama, Y., Boku, T., Sato, M.: Interconnect for tightly coupled accelerators architecture. In: IEEE 21st Annual Sympsium on High-Performance Interconnects (HOT Interconnects 21) (2013)
Google Scholar
Hornung, R.D., Keasler, J.A.: The RAJA portability layer: overview and status. Technical Report LLNLTR-661403, LLNL (2014)
Google Scholar
McMillan, S.L.W.: The vectorization of small-N integrators. In: Hut, P., McMillan, S.L.W. (eds.) The Use of Supercomputers in Stellar Dynamics. Lecture Notes in Physics, vol. 267, p. 156. Springer, Berlin (1986). https://doi.org/10.1007/BFb0116406
Google Scholar
Mellanox Fabric Collective Accelerator. http://www.mellanox.com/
Miki, Y., Umemura, M.: GOTHIC: gravitational oct-tree code accelerated by hierarchical time step controlling. New Astron. 52, 65–81 (2017). https://doi.org/10.1016/j.newast.2016.10.007
Article Google Scholar
Miki, Y., Umemura, M.: MAGI: many-component galaxy initializer. Mon. Not. R. Astron. Soc. 475, 2269–2281 (2018). https://doi.org/10.1093/mnras/stx3327
Article Google Scholar
NVIDIA Corporation: NVIDIA GPUDirect (2014). https://developer.nvidia.com/gpudirect
Google Scholar
Odajima, T., et al.: Hybrid communication with TCA and infiniband on a parallel programming language XcalableACC for GPU clusters. In: Proceedings of the 2015 IEEE International Conference on Cluster Computing, pp. 627–634, Sept 2015
Google Scholar
Omni Compiler Project: Omni compiler project (2018). http://omni-compiler.org/
OpenACC-Standard.org: The OpenACC application programming interface version 2.0 (2013). http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf
Potluri, S., Hamidouche, K., Venkatesh, A., Bureddy, D., Panda, D.K.: Efficient inter-node MPI communication using GPUDirect RDMA for infiniband clusters with NVIDIA GPUs. In: Proceedings of the International Conference on Parallel Processing, pp. 80–89 (2013)
Google Scholar
RIKEN AICS and University of Tsukuba: XcalableACC language specification version 1.0 (2017). http://xcalablemp.org/download/XACC/xacc-spec-1.0.pdf
Sidelnik, A., et al.: Performance portability with the Chapel language. In: Proceedings of the IEEE 26th International Parallel and Distributed Processing Symposium, pp. 582–594 (2012)
Google Scholar
Stone, A.I., et al.: Evaluating coarray fortran with the cgpop miniapp. In: Proceedings of the Fifth Conference on Partitioned Global Address Space Programming Models (PGAS), Oct 2011.
Google Scholar
Tsuruta, C., Miki, Y., Kuhara, T., Amano, H., Umemura, M.: Off-loading LET generation to PEACH2: a switching hub for high performance GPU clusters. In: ACM SIGARCH Computer Architecture News – HEART15, vol. 43, pp. 3–8. ACM, New York (2016). http://doi.acm.org/10.1145/2927964.2927966
Article Google Scholar
Tsuruta, C., Kaneda, K., Nishikawa, N., Amano, H.: Accelerator-in-switch: a framework for tightly coupled switching hub and an accelerator with FPGA. In: 27th International Conference on Field Programmable Logic & Application (FPL2017) (2017)
Google Scholar
Warren, M.S., Salmon, J.K.: Astrophysical N-body simulations using hierarchical tree data structures. In: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, pp. 570–576. IEEE Computer Society Press (1992)
Google Scholar
Wilson, K.G.: Confinement of quarks. Phys. Rev. D 10, 2445–2459 (1974)
Article Google Scholar
XcalableMP Specification Working Group: XcalableMP specification version 1.2 (2013). http://www.xcalablemp.org/download/spec/xmp-spec-1.2.pdf
Zenker, E., Worpitz, B., Widera, R., Huebl, A., Juckeland, G., Knpfer, A., Nagel, W.E., Bussmann, M.: Alpaka – an abstraction library for parallel Kernel acceleration. In: Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 631–640, May 2016
Google Scholar
Zilberman, N., Audzevich, Y., Kalogeridou, G., Bojan, N.M., Zhang, J., Moore, A.W.: NetFPGA – rapid prototyping of high bandwidth devices in open source. In: 25th International Conference on Field Programmable Logic and Applications (FPL) (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan
Taisuke Boku & Masayuki Umemura
Information Technology Center, The University of Tokyo, Tokyo, Japan
Toshihiro Hanawa & Yohei Miki
Center for Computational Science, RIKEN, Kobe, Japan
Hitoshi Murai & Masahiro Nakao
Department of Information and Computer Science, Keio University, Tokyo, Japan
Hideharu Amano

Authors

Taisuke Boku
View author publications
You can also search for this author in PubMed Google Scholar
Toshihiro Hanawa
View author publications
You can also search for this author in PubMed Google Scholar
Hitoshi Murai
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Nakao
View author publications
You can also search for this author in PubMed Google Scholar
Yohei Miki
View author publications
You can also search for this author in PubMed Google Scholar
Hideharu Amano
View author publications
You can also search for this author in PubMed Google Scholar
Masayuki Umemura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taisuke Boku .

Editor information

Editors and Affiliations

RIKEN Center for Computational Science, Kobe, Japan
Mitsuhisa Sato

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Boku, T. et al. (2019). GPU-Accelerated Language and Communication Support by FPGA. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_15

Download citation

DOI: https://doi.org/10.1007/978-981-13-1924-2_15
Published: 07 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1923-5
Online ISBN: 978-981-13-1924-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics