Fraunhofer-Gesellschaft

Publica

Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme

 
: Mukunoki, D.; Ozaki, K.; Ogita, T.; Iakymchuk, R.

:
Fulltext ()

Association for Computing Machinery -ACM-; Korea Institute of Science and Technology Information -KISTI-, Daejeon:
International Conference on High Performance Computing in Asia-Pacific Region, HPC Asia 2021. Proceedings : January 20-22, 2021, Republic of Korea
New York: ACM, 2021
ISBN: 978-1-4503-8842-9
pp.100-109
International Conference on High Performance Computing in Asia-Pacific Region (HPC Asia) <2021, Online>
English
Conference Paper, Electronic Publication
Fraunhofer ITWM ()

Abstract
On Krylov subspace methods such as the Conjugate Gradient (CG) method, the number of iterations until convergence may increase due to the loss of computational accuracy caused by rounding errors in floating-point computations. At the same time, because the order of the computation is nondeterministic on parallel computation, the result and the behavior of the convergence may be nonidentical in different computational environments, even for the same input. In this study, we present an accurate and reproducible implementation of the unpreconditioned CG method on x86 CPUs and NVIDIA GPUs. In our method, while all variables are stored on FP64, all inner product operations (including matrix-vector multiplications) are performed using the Ozaki scheme. The scheme delivers the correctly rounded computation as well as bit-level reproducibility among different computational environments. In this paper, we show some examples where the standard FP64 implementation of CG results in nonidentical results across different CPUs and GPUs. We then demonstrate the applicability and the effectiveness of our approach in terms of accuracy and reproducibility and their performance on both CPUs and GPUs. Furthermore, we compare the performance of our method against an existing accurate and reproducible CG implementation based on the Exact Basic Linear Algebra Subprograms (ExBLAS) on CPUs.

: http://publica.fraunhofer.de/documents/N-637583.html