Options
2023
Conference Paper
Title
Demonstrating NADA: A Workflow for Distributed CNN Training on FPGA Clusters
Abstract
We introduce our Network Attached Deep learning Accelerator called NADA, which consists of a novel and flexible HW/SW framework for efficient training of deep neural networks on FPGA clusters. NADA is centered around layer parallelism, instantiating a specific implementation for each layer. These implementations are placed across the desired number of network attached FPGAs in the cluster. The NADA hardware architecture relies on a high-speed UDP/IP pure hardware network stack. We demonstrate the usability of our approach by training a couple of demo networks on an Arria 10 FPGA cluster.