Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Distributed training of deep neural networks: Theoretical and practical limits of parallel scalability

 
: Keuper, J.; Preundt, F.-J.

:

Institute of Electrical and Electronics Engineers -IEEE-; IEEE Computer Society; Association for Computing Machinery -ACM-:
2nd Workshop on Machine Learning in HPC Environments, MLHPC 2016 : Held as part of SC 2016, Salt Lake City, Utah, USA, 14 November 2016; Workshop on Machine Learning in High-Performance Computing Environments
Piscataway, NJ: IEEE, 2016
ISBN: 978-1-5090-3882-4
pp.19-26
Workshop on Machine Learning in HPC Environments (MLHPC) <2, 2016, Salt Lake City/Utah>
Supercomputing Conference & Expo (SC) <2016, Salt Lake City/Utah>
English
Conference Paper
Fraunhofer ITWM ()

Abstract
This paper presents a theoretical analysis and practical evaluation of the main bottlenecks towards a scalable distributed solution for the training of Deep Neural Networks (DNNs). The presented results show, that the current state of the art approach, using data-parallelized Stochastic Gradient Descent (SGD), is quickly turning into a vastly communication bound problem. In addition, we present simple but fixed theoretic constraints, preventing effective scaling of DNN training beyond only a few dozen nodes. This leads to poor scalability of DNN training in most practical scenarios.

: http://publica.fraunhofer.de/documents/N-428879.html