Options
2020
Conference Paper
Titel
A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol for Network-Attached Hardware Accelerators
Abstract
The integration and usage of application specific processor cores and accelerators into data center installations is state of the art since at least one decade with the advent of GPGPUs. In most cases these accelerators are coupled via standard PCIe interfaces to the corresponding host computers, which leads to disadvantages in interoperability, scalability and overall power consumption. As a viable alternative to PCIe-attached FPGA accelerators this paper proposes standalone FPGAs as network-attached accelerators (NAA). To provide all necessary infrastructure for decoupled FPGAs we present a framework incorporating a network stack which implements RDMA over Converged Ethernet v2 (RoCEv2) by Infiniband and UDP/IP communication for high-speed and low-latency data transfer. We present the requirements such a framework has to fulfill and how this can be satisfied with our network stack. For NAAs to be used instead of PCIe coupled FPGAs the framework needs to use as little resources as possible and at the same time provide similar throughput and latency. Therefore, we show that our network stack is capable of 100 Gb/s throughput with latencies of less than 4 ms while using less than 10% of the available resources on a mid-range FPGA. Based on our results, network-attached FPGAs are a great alternative to the more energy intensive PCIe-attached FPGA accelerators.