Receive-Side Notification for Enhanced RDMA in FPGA Based Networks

Lant, Joshua; Attwood, Andrew; Navaridas, Javier; Lujan, Mikel; Goodacre, John

doi:10.1007/978-3-030-18656-2_17

Joshua Lant¹⁹,
Andrew Attwood¹⁹,
Javier Navaridas¹⁹,
Mikel Lujan¹⁹ &
…
John Goodacre¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11479))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1106 Accesses

Abstract

FPGAs are rapidly gaining traction in the domain of HPC thanks to the advent of FPGA-friendly data-flow workloads, as well as their flexibility and energy efficiency. However, these devices pose a new challenge in terms of how to better support their communications, since standard protocols are known to hinder their performance greatly either by requiring CPU intervention or consuming excessive FPGA logic. Hence, the community is moving towards custom-made solutions. This paper analyses an optimization to our custom, reliable, interconnect with connectionless transport—a mechanism to register and track inbound RDMA communication at the receive-side. This way, it provides completion notifications directly to the remote node which saves a round-trip latency. The entire mechanism is designed to sit within the fabric of the FPGA, requiring no software intervention. Our solution is able to reduce the latency of a receive operation by around 20\(\%\) for small message sizes (4 KB) over a single hop (longer distances would experience even higher improvement). Results from synthesis over a wide parameter range confirm this optimization is scalable both in terms of the number of concurrent outstanding RDMA operations, and the maximum message size.

This work was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreements No 671553 and 754337.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Caulfield, A.M., et al.: A cloud-scale acceleration architecture. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 7. IEEE Press (2016)
Google Scholar
Concatto, C., et al.: A CAM-free exascalable HPC router for low-energy communications. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 99–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1_8
Chapter Google Scholar
Dally, W.J., Aoki, H.: Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Trans. Parallel Distrib. Syst. 4(4), 466–475 (1993)
Article Google Scholar
El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., Buell, D.: The promise of high-performance reconfigurable computing. Computer 41(2), 69–76 (2008)
Article Google Scholar
Grant, R.E., Rashti, M.J., Balaji, P., Afsahi, A.: Scalable connectionless RDMA over unreliable datagrams. Parallel Comput. 48, 15–39 (2015)
Article Google Scholar
Katevenis, M., et al.: Next generation of exascale-class systems: exanest project and the status of its interconnect and storage development. Microprocess. Microsyst. 61, 58–71 (2018)
Article Google Scholar
Katevenis, M., et al.: The exanest project: interconnects, storage, and packaging for exascale systems. In: 2016 Euromicro Conference on Digital System Design (DSD), pp. 60–67. IEEE (2016)
Google Scholar
Koop, M.J., Sur, S., Gao, Q., Panda, D.K.: High performance MPI design using unreliable datagram for ultra-scale infiniband clusters. In: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 180–189. ACM (2007)
Google Scholar
Lant, J., et al.: Enabling shared memory communication in networks of mpsocs. Concurr. Comput. Pract. Exp. (CCPE), e4774 (2018)
Google Scholar
Mogul, J.C.: TCP offload is a dumb idea whose time has come. In: HotOS, pp. 25–30 (2003)
Google Scholar
Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11), 1–4 (2015)
Google Scholar
PLDA: An implementation of the TCP/IP protocol suite for the Linux operating system (2018). https://github.com/torvalds/linux/blob/master/net/ipv4/tcp.c
Intilop Corporation: 10 g bit TCP offload engine + PCIe/DMA soc IP (2012)
Google Scholar
Ohio Supercomputing Centre: Software implementation and testing of iWarp protocol (2018). https://www.osc.edu/research/network_file/projects/iwarp
Sidler, D., Alonso, G., Blott, M., Karras, K., Vissers, K., Carley, R.: Scalable 10Gbps TCP/IP stack architecture for reconfigurable hardware. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 36–43. IEEE (2015)
Google Scholar
Underwood, K.D., Hemmert, K.S., Ulmer, C.D.: From silicon to science: the long road to production reconfigurable supercomputing. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2(4), 26 (2009)
Google Scholar
Xilinx Inc.: Zynq UltraScale + MPSoC Data Sheet: Overview (2018). v1.7
Google Scholar
Xirouchakis, P., et al.: The network interface of the exanest hpc prototype. Technical report, ICS-FORTH / TR 471, Heraklion, Crete, Greece (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Manchester, Manchester, M13 9PL, UK
Joshua Lant, Andrew Attwood, Javier Navaridas, Mikel Lujan & John Goodacre

Authors

Joshua Lant
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Attwood
View author publications
You can also search for this author in PubMed Google Scholar
Javier Navaridas
View author publications
You can also search for this author in PubMed Google Scholar
Mikel Lujan
View author publications
You can also search for this author in PubMed Google Scholar
John Goodacre
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua Lant .

Editor information

Editors and Affiliations

Technical University of Denmark, Lyngby, Denmark
Martin Schoeberl
Technical University of Darmstadt, Darmstadt, Germany
Christian Hochberger
Airbus Defence and Space GmbH, Taufkirchen, Germany
Sascha Uhrig
University of Hanover, Hanover, Germany
Jürgen Brehm
Otto-von-Guericke University, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lant, J., Attwood, A., Navaridas, J., Lujan, M., Goodacre, J. (2019). Receive-Side Notification for Enhanced RDMA in FPGA Based Networks. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-18656-2_17
Published: 25 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18655-5
Online ISBN: 978-3-030-18656-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics