Options
2024
Conference Paper
Title
An Approach Towards Distributed DNN Training on FPGA Clusters
Abstract
We present NADA, a Network Attached Deep learning Accelerator. It provides a flexible hardware/software framework for training deep neural networks on ethernet-based FPGA clusters. The NADA hardware framework instantiates a dedicated entity for each layer in a model. Features and gradients flow through these entities in a tightly pipelined manner. From a compact description of a model and target cluster, the NADA software framework generates specific configuration bitstreams for each particular FPGA in the cluster. We demonstrate the scalability and flexibility of our approach by mapping an example CNN onto a cluster consisting of three up to nine Intel Arria 10 FPGAs. To verify NADAs effectiveness for commonly used networks, we train MobileNetV2 on a six-node cluster. We address the inherent incompatibility of the tightly pipelined layer parallel approach with batch normalization by using online normalization instead.