An Approach Towards Distributed DNN Training on FPGA Clusters

Kreowsky, Philipp; Knapheide, Justin; Stabernack, Benno

doi:10.1007/978-3-031-66146-4_2

2024

Conference Paper

Abstract

We present NADA, a Network Attached Deep learning Accelerator. It provides a flexible hardware/software framework for training deep neural networks on ethernet-based FPGA clusters. The NADA hardware framework instantiates a dedicated entity for each layer in a model. Features and gradients flow through these entities in a tightly pipelined manner. From a compact description of a model and target cluster, the NADA software framework generates specific configuration bitstreams for each particular FPGA in the cluster. We demonstrate the scalability and flexibility of our approach by mapping an example CNN onto a cluster consisting of three up to nine Intel Arria 10 FPGAs. To verify NADAs effectiveness for commonly used networks, we train MobileNetV2 on a six-node cluster. We address the inherent incompatibility of the tightly pipelined layer parallel approach with batch normalization by using online normalization instead.

Author(s)

Kreowsky, Philipp

Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI

Knapheide, Justin

Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI

Stabernack, Benno

Fraunhofer-Institut für Nachrichtentechnik, Heinrich-Hertz-Institut HHI

Mainwork

Architecture of Computing Systems: 37th International Conference, ARCS 2024

Conference

International Conference on Architecture of Computing Systems 2024

Options

An Approach Towards Distributed DNN Training on FPGA Clusters