Skip to main content

Receive-Side Notification for Enhanced RDMA in FPGA Based Networks

  • Conference paper
  • First Online:
Architecture of Computing Systems – ARCS 2019 (ARCS 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11479))

Included in the following conference series:

  • 1106 Accesses

Abstract

FPGAs are rapidly gaining traction in the domain of HPC thanks to the advent of FPGA-friendly data-flow workloads, as well as their flexibility and energy efficiency. However, these devices pose a new challenge in terms of how to better support their communications, since standard protocols are known to hinder their performance greatly either by requiring CPU intervention or consuming excessive FPGA logic. Hence, the community is moving towards custom-made solutions. This paper analyses an optimization to our custom, reliable, interconnect with connectionless transport—a mechanism to register and track inbound RDMA communication at the receive-side. This way, it provides completion notifications directly to the remote node which saves a round-trip latency. The entire mechanism is designed to sit within the fabric of the FPGA, requiring no software intervention. Our solution is able to reduce the latency of a receive operation by around 20\(\%\) for small message sizes (4 KB) over a single hop (longer distances would experience even higher improvement). Results from synthesis over a wide parameter range confirm this optimization is scalable both in terms of the number of concurrent outstanding RDMA operations, and the maximum message size.

This work was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreements No 671553 and 754337.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Caulfield, A.M., et al.: A cloud-scale acceleration architecture. In: The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 7. IEEE Press (2016)

    Google Scholar 

  2. Concatto, C., et al.: A CAM-free exascalable HPC router for low-energy communications. In: Berekovic, M., Buchty, R., Hamann, H., Koch, D., Pionteck, T. (eds.) ARCS 2018. LNCS, vol. 10793, pp. 99–111. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77610-1_8

    Chapter  Google Scholar 

  3. Dally, W.J., Aoki, H.: Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Trans. Parallel Distrib. Syst. 4(4), 466–475 (1993)

    Article  Google Scholar 

  4. El-Ghazawi, T., El-Araby, E., Huang, M., Gaj, K., Kindratenko, V., Buell, D.: The promise of high-performance reconfigurable computing. Computer 41(2), 69–76 (2008)

    Article  Google Scholar 

  5. Grant, R.E., Rashti, M.J., Balaji, P., Afsahi, A.: Scalable connectionless RDMA over unreliable datagrams. Parallel Comput. 48, 15–39 (2015)

    Article  Google Scholar 

  6. Katevenis, M., et al.: Next generation of exascale-class systems: exanest project and the status of its interconnect and storage development. Microprocess. Microsyst. 61, 58–71 (2018)

    Article  Google Scholar 

  7. Katevenis, M., et al.: The exanest project: interconnects, storage, and packaging for exascale systems. In: 2016 Euromicro Conference on Digital System Design (DSD), pp. 60–67. IEEE (2016)

    Google Scholar 

  8. Koop, M.J., Sur, S., Gao, Q., Panda, D.K.: High performance MPI design using unreliable datagram for ultra-scale infiniband clusters. In: Proceedings of the 21st Annual International Conference on Supercomputing, pp. 180–189. ACM (2007)

    Google Scholar 

  9. Lant, J., et al.: Enabling shared memory communication in networks of mpsocs. Concurr. Comput. Pract. Exp. (CCPE), e4774 (2018)

    Google Scholar 

  10. Mogul, J.C.: TCP offload is a dumb idea whose time has come. In: HotOS, pp. 25–30 (2003)

    Google Scholar 

  11. Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res. Whitepaper 2(11), 1–4 (2015)

    Google Scholar 

  12. PLDA: An implementation of the TCP/IP protocol suite for the Linux operating system (2018). https://github.com/torvalds/linux/blob/master/net/ipv4/tcp.c

  13. Intilop Corporation: 10 g bit TCP offload engine + PCIe/DMA soc IP (2012)

    Google Scholar 

  14. Ohio Supercomputing Centre: Software implementation and testing of iWarp protocol (2018). https://www.osc.edu/research/network_file/projects/iwarp

  15. Sidler, D., Alonso, G., Blott, M., Karras, K., Vissers, K., Carley, R.: Scalable 10Gbps TCP/IP stack architecture for reconfigurable hardware. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), pp. 36–43. IEEE (2015)

    Google Scholar 

  16. Underwood, K.D., Hemmert, K.S., Ulmer, C.D.: From silicon to science: the long road to production reconfigurable supercomputing. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2(4), 26 (2009)

    Google Scholar 

  17. Xilinx Inc.: Zynq UltraScale + MPSoC Data Sheet: Overview (2018). v1.7

    Google Scholar 

  18. Xirouchakis, P., et al.: The network interface of the exanest hpc prototype. Technical report, ICS-FORTH / TR 471, Heraklion, Crete, Greece (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joshua Lant .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lant, J., Attwood, A., Navaridas, J., Lujan, M., Goodacre, J. (2019). Receive-Side Notification for Enhanced RDMA in FPGA Based Networks. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18656-2_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18655-5

  • Online ISBN: 978-3-030-18656-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics