Open-Source Shared Memory implementation of the HPCG benchmark: analysis, improvements and evaluation on Cavium ThunderX2

2019 
The High Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity has been steadily raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), in this paper we introduce and evaluate in-depth two OpenMP parallelization strategies for the Gauss-Seidel preconditioner. Due to the increasing attractiveness of Arm architecture and Arm ecosystem in HPC, we evaluate our modified HPCG version on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a broader contribution not exclusively to the Arm: along with this paper, the source code of the modified HPCG has been made publicly available on GitLab to enable further optimizations at benefit of all HPC community.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    3
    Citations
    NaN
    KQI
    []