Iterative SLE solvers over a CPU-GPU platform

Binotto, Alecio; Daniel, Christian G.; Weber, Daniel; Kuijper, Arjan; Stork, André; Pereira, Carlos Eduardo; Fellner, Dieter W.

doi:10.1109/HPCC.2010.40

2010

Conference Paper

Abstract

GPUs (Graphics Processing Units) have become one of the main co-processors that contributed to desktops towards high performance computing. Together with multi-core CPUs, a powerful heterogeneous execution platform is built for massive calculations. To improve application performance and explore this heterogeneity, a distribution of workload in a balanced way over the PUs (Processing Units) plays an important role for the system. However, this problem faces challenges since the cost of a task at a PU is non-deterministic and can be influenced by several parameters not known a priori, like the problem size domain. We present a comparison of iterative SLE (Systems of Linear Equations) solvers, used in many scientific and engineering applications, over a heterogeneous CPU-GPUs platform and characterize scenarios where the solvers obtain better performances. A new technique to improve memory access on matrix vector multiplication used by SLEs on GPUs is described and compared to standard implementations for CPU and GPUs. Such timing profiling is analyzed and break-even points based on the problem sizes are identified for this implementation, pointing whether our technique is faster to use GPU instead of CPU. Preliminary results show the importance of this study applied to a real-time CFD (Computational Fluid Dynamics) application with geometry modification.