Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures

2020 
Modern multi-/many-cores offer higher core-density, hardware multi-threading, deeper memory hierarchies, and diverse architectural capabilities. While emerging cloud-based HPC systems are able to deliver near-native performance, they bring more diversity to the architectures. The Message Passing Interface (MPI) offers the flexibility to arbitrarily bind application processes to CPU cores, however the static nature of these binding policies typically does not take applications’ communication patterns and underlying machine architecture into consideration. This lack of association between the dynamic nature of applications and architectural diversity offered by modern processors makes it difficult for the application developers and MPI designers to exploit modern multi-/many-core systems to their full potential. In this paper, we propose a set of low-level benchmarking based approaches and MPI-level designs to infer vendor-specific machine characteristics e.g., physical to virtual machine topologies, and dynamic communication patterns of the applications. By utilizing this information, we propose two novel algorithms to construct efficient MPI mappings for any given architecture and application communication pattern. The proposed designs are implemented in the MVAPICH2 MPI library and are evaluated on three different architectures using various micro-benchmarks and application kernels. We demonstrate up to 2X performance improvement for MPI collectives, and up to 3.5X and 26% improvement for NAS-CG and miniAMR application kernels, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []