Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Distributed training of deep neural networks: Theoretical and practical limits of parallel scalability

: Keuper, J.; Preundt, F.-J.


Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society; Association for Computing Machinery -ACM-:
2nd Workshop on Machine Learning in HPC Environments, MLHPC 2016 : Held as part of SC 2016, Salt Lake City, Utah, USA, 14 November 2016; Workshop on Machine Learning in High-Performance Computing Environments
Piscataway, NJ: IEEE, 2016
ISBN: 978-1-5090-3882-4
Workshop on Machine Learning in HPC Environments (MLHPC) <2, 2016, Salt Lake City/Utah>
Supercomputing Conference & Expo (SC) <2016, Salt Lake City/Utah>
Conference Paper
Fraunhofer ITWM ()

This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neural Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.