The Preconditioned Conjugate Gradient method is often used in numerical simulations. While being widely used, the solver is also known for its lack of accuracy while computing the residual. In this article, we aim at a twofold goal: enhance the accuracy of the solver but also ensure its reproducibility in a message-passing implementation. We design and employ various strategies starting from the ExBLAS approach (through preserving every bit of information until final rounding) to its more lightweight performance-oriented variant (through expanding the intermediate precision). These algorithmic strategies are reinforced with programmability suggestions to assure deterministic executions. Finally, we verify these strategies on modern HPC systems: both versions deliver reproducible number of iterations, residuals, direct errors, and vector-solutions for the overhead of only 29% (ExBLAS) and 4% (lightweight) on 768 processes.
- Floating-point expansion
- Long accumulator
- Preconditioned Conjugate Gradient
- High-Performance Computing
Iakymchuk, R., Barreda, M., Wiesenberger, M., Aliaga, J. I., & Quintana-Ortí, E. S. (2020). Reproducibility strategies for parallel Preconditioned Conjugate Gradient. Journal of Computational and Applied Mathematics, 371, . https://doi.org/10.1016/j.cam.2019.112697