Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Using GPI-2 for distributed memory paralleliziation of the caffe toolbox to speed up deep neural network training

 
: Kühn, Martin; Keuper, Janis; Pfreundt, Franz-Josef

:
Fulltext (PDF; )

International Academy, Research, and Industry Association -IARIA-:
Seventh International Conference on Advanced Communications and Computation, INFOCOMP 2017 : June 25 - 29, 2017, Venice, Italy
IARIA, 2017
ISBN: 978-1-61208-567-8
ISBN: 978-1-61208-073-4
pp.75-79
International Conference on Advanced Communications and Computation (INFOCOMP) <7, 2017, Venice>
English
Conference Paper, Electronic Publication
Fraunhofer ITWM ()

Abstract
Deep Neural Network (DNN) are currently of great interest in research and application. The training of these networks is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granular synchronization patterns over global barriers. To implement these communication patterns we rely on the the ”Global address space Programming Interface” version 2 (GPI-2) communication library. This interface provides a lightweight set of asynchronous one-sided communication primitives supplemented by non-blocking fine granular data synchronization mechanisms. Therefore, CaffeGPI is the name of our parallel version of Caffe. First benchmarks demonstrate better scaling behavior compared with other extensions, e.g., the IntelTMCaffe. Even within a single symmetric multiprocessing machine with four graphics processing units, the CaffeGPI scales better than the standard Caffe toolbox. These first results demonstrate that the use of standard High Performance Computing (HPC) hardware is a valid cost saving approach to train large DDNs. I/O is an other bottleneck to work with DDNs in a standard parallel HPC setting, which we will consider in more detail in a forthcoming paper.

: http://publica.fraunhofer.de/documents/N-477187.html