Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

GASPI/GPI in-memory check pointing library

: Bartsch, Valeria; Machado, Rui; Rahn, Mirko; Merten, Dirk; Pfreundt, Franz-Josef


Rivera, F.F.:
Euro-Par 2017. Parallel processing. 23rd International Conference on Parallel and Distributed Computing : Santiago de Compostela, Spain, August 28 - September 1, 2017; Proceedings
Cham: Springer International Publishing, 2017 (Lecture Notes in Computer Science 10417)
ISBN: 978-3-319-64203-1
ISBN: 978-3-319-64202-4
ISBN: 3-319-64202-2
International Conference on Parallel and Distributed Computing (Euro-Par) <23, 2017, Santiago de Compostela>
Conference Paper
Fraunhofer ITWM ()

Fault tolerance becomes an important feature at large computer systems where the mean time between failure decreases. Checkpointing is a method often used to provide resilience. We present an in-memory checkpointing library based on a PGAS API implemented with GASPI/GPI. It offers a substantial benefit when recovering from failure and leverages existing fault tolerance features of GASPI/GPI. The overhead of the library is negligible when testing it with a simple stencil code and a real life seismic imaging method.