• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Efficient and Eventually Consistent Collective Operations
 
  • Details
  • Full
Options
2021
Conference Paper
Title

Efficient and Eventually Consistent Collective Operations

Abstract
Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically.In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations - such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show promising preliminary results with significant improvements, especially for Allreduce and AlltoAll, compared to the vendor-provided MPI alternatives.
Author(s)
Iakymchuk, Roman  
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Faustino, Amandio
INESC-ID & IST (ULisboa)
Emerson, Andrew
CINECA, Casalecchio di Reno
Barreto, Joao
INESC-ID & IST (ULisboa)
Bartsch, Valeria  
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Rodrigues, Rodrigo
INESC-ID & IST (ULisboa)
Monteiro, Joao Carlos
INESC-ID & IST (ULisboa)
Mainwork
IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021  
Project(s)
EPEEC  
Funder
European Commission  
Conference
International Parallel and Distributed Processing Symposium (IPDPS) 2021  
Workshop on Advances in Parallel and Distributed Computational Models (APDCM) 2021  
Open Access
DOI
10.1109/IPDPSW52791.2021.00096
Additional link
Full text
Language
English
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Keyword(s)
  • collective

  • Allreduce

  • AlltoAll

  • Stale Synchronous Parallel

  • GASPI

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024