• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Buch
  4. Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki Scheme
 
  • Details
  • Full
Options
2020
Report
Title

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki Scheme

Abstract
On Krylov subspace methods such as the Conjugate Gradient (CG), the number of iterations until convergence may increase due to the loss of computation accuracy caused by rounding errors in floating-point computations. Besides, as the order of operations is non-deterministic on parallel computations, the result and the behavior of the convergence may be non-identical in different environments, even for the same input. This paper presents a new approach for the CG method with high accuracy as well as bit-level reproducibility of computed solutions on many-core processors, including both x86 CPUs and NVIDIA GPUs. In our proposed approach, accurate and reproducible operations are installed into all the inner-product based operations such as matrix-vector multiplication and dot-product, which are the main sources that may disturb reproducibility in the CG method. The accurate and reproducible operations are performed using the Ozaki scheme, which is the error-free transformation for dot-product that can ensure the correct-rounding. As this method can be built upon vendor-provided linear algebra libraries such as Intel Math Kernel Library and NVIDIA cuBLAS/ cuSparse, it reduces the development cost. In this paper, showing some examples with the non-identical conver-gences and computed solutions on different platforms, we demonstrate the applicability and the effectiveness of the proposed approach as well as its performance on both CPUs and GPUs. Besides, we compare against an existing accurate and reproducible CG implementation based on the Exact BLAS (ExBLAS) on CPUs.
Author(s)
Mukunoki, Daichi
RIKEN Center for Computational Science Kobe, Hyogo
Ozaki, Katsuhisa
Shibaura Institute of Technology Saitama, Japan
Ogita, Takeshi
Tokyo Womans Christian University Tokyo, Japan
Iakymchuk, Roman  
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
Project(s)
KAKENHI
Robust
Funder
Japan Society for the Promotion of Science JSPS  
European Commission EC  
Link
Link
Language
English
Fraunhofer-Institut für Techno- und Wirtschaftsmathematik ITWM  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024