• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities
 
  • Details
  • Full
Options
September 2024
Journal Article
Title

Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities

Abstract
With the increase of complex scientific simulations driven by workflows and heterogeneous workload profiles, managing system resources effectively is essential for improving performance and system throughput, especially due to trends like heterogeneous HPC and deeply integrated systems with on-chip accelerators. For optimal resource utilization, dynamic resource allocation can improve productivity across all system and application levels, by adapting the applications’ configurations to the system's resources. In this context, malleable jobs, which can change resources at runtime, can increase the system throughput and resource utilization while bringing various advantages for HPC users (e.g., shorter waiting time). Malleability has received much attention recently, even though it has been an active research area for more than two decades. This article presents the state-of-the-art of malleable implementations in HPC systems, targeting mainly malleability in compute and I/O resources. Based on our experiences, we state our current concerns and list future opportunities for research.
Author(s)
Tarraf, Ahmad
Schreiber, Martin
Cascajo, Alberto
Besnard, Jean-Baptiste
Vef, Marc-André
Huber, Dominik
Happ, Sonja
Brinkmann, André
Singh, David
Hoppe, Hans-Christian
Miranda, Alberto
Peña, Antonio J.
Silva Machado, Rui Màrio da
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Garcia-Gasulla, Marta
Schulz, Martin
Carpenter, Paul
Pickartz, Simon
Rotaru, Tiberiu  
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Iserte, Sergio
Lopez, Victor
Ejarque, Jorge
Sirwani, Heena
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Carretero, Jesus
Wolf, Felix
Journal
IEEE transactions on parallel and distributed systems  
Open Access
DOI
10.1109/TPDS.2024.3406764
Additional link
Full text
Language
English
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Keyword(s)
  • Resource management

  • Runtime

  • Monitoring

  • Dynamic scheduling

  • Throughput

  • Terminology

  • Systems support

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024