Reproducible Parallel Simulations in HPC

2017 
Post Moore's era supercomputing will certainly require more hierarchical parallelism and variable precision floating-point arithmetic to satisfy the computing need of exascale numerical simulations. Nevertheless floating-point addition will remain non asso-ciative and so parallel computations will still be prone to return results being different from one run to another one. These failures of the numerical reproducibility reduce the simulation reliability and complicate the debugging and the validating steps of large scale software. We present two case studies to illustrate how to recover this numerical reproducibility without jeopardizing the computing efficiency. Hydrodynamics parallel simulations with the openTelemac code rely on finite element modelization, subdomain decomposition and iterative solvers. Two openTelemac modules have been modified to provide reproducible results for any number of computing units thanks to targeted compensation techniques. We also describe and analyze generic solutions that are also provided by reproducible and accurately rounded BLAS.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []