Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors

Guillaume Latu,Yuuichi Asahi,Julien Bigot,Tamás Fehér,Virginie Grandgirard

Scaling and Optimizing the Gysela Code on a Cluster of Many-Core Processors

2018

The current generation of the Xeon Phi Knights Landing (KNL) processor provides a highly multi-threaded environment on which regular programming models such as MPIjopenMP can be used. Many factors impact the performance achieved by applications on these devices: one of the key points is the efficient exploitation of SIMD vector units, and one another is the memory access pattern. Works have been conducted to adapt a plasma turbulence application, namely Gysela, for this architecture. A set of different techniques have been used: standard vectorization techniques, auto-tuning of one computation kernel, switching to high-order scheme. As a result, KNL execution times have been reduced by up to a factor 3. This effort has also permitted to gain a speedup of 2x on Broadwell architecture and 3x on Skylake. Nice scalability curves up to a few thousands cores have been obtained on a strong scaling experiment. Incremental work meant a large payoff without resorting to using low-level intrinsics.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations