Design and implementation of a distributed metascheduler

Heilgeist, J.; Soddemann, T.; Richter, H.

doi:10.1109/ADVCOMP.2009.17

2009

Conference Paper

Abstract

This paper describes a metascheduler for high-performance computing (HPC) grids that is build upon a distributed architecture. It is modelled around cooperating peers represented by the local proxies deployed by participating sites. These proxies exchange job descriptions between themselves with the aim of improving user-, administration-, and grid-defined metrics. Relevant metrics can include, e.g., reduced job runtimes, improved resource utilization, and increased job turnover. The metascheduler uses peer-to-peer algorithms to discover under-utilized resources and unserviced jobs. A selection is made based on a simplified variant of the Analytic Hierarchy Process that we adapted to the special requirements imposed by the Grid. It enables geographically distributed stakeholders to participate in the decision and supports dynamic evaluation of the necessary utility values. Finally, we identify four intrinsic problems that obstruct the implementation of metaschedulers in general.

Author(s)

Heilgeist, J.

Soddemann, T.

Richter, H.

Mainwork

Third International Conference on Advanced Engineering Computing and Applications in Sciences, ADVCOMP 2009

Conference

International Conference on Advanced Engineering Computing and Applications in Sciences (ADVCOMP) 2009

Options

Design and implementation of a distributed metascheduler