Multiple endpoints for improved MPI performance on a lattice QCD code
2018
This paper provides results using multiple threads and a high-performance MPI implementation of MPI_THREAD_MULTIPLE applied to a Lattice QCD Code (CCS-QCD) and run on the Oakforest-PACS machine. Performance has improved from the baseline code by as much as 1.8x for smaller lattice sizes.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
6
References
1
Citations
NaN
KQI