Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

Andreas Diavastos,Giannos Stylianou,Giannis Koutsou

Exploiting Very-Wide Vectors on Intel Xeon Phi with Lattice-QCD Kernels

2016

Our target in this work is to study ways of exploring the parallelism offered by vectorization on accelerators with very wide vector units. To this end, we implemented two kernels that derive from the Wilson Dslash operator and investigate several data layout techniques for increasing the scalability of lattice QCD scientific kernels suitable for the Intel Xeon Phi. In parts of the application where real numbers are used for computation, we see a 6.6x increase in bandwidth compared to scalar code, thanks to the auto-vectorization by the compiler. In other kernels where arithmetic operations on complex numbers dominate, our hand-vectorized code out-performs the auto-vectorization of the compiler. In this paper we find that our proposed Hopping Vector-friendly Ordering allows for more efficient vectorization of complex arithmetic floating point operations. Using this data layout, we manage to increase the sustained bandwidth by approximately 1.8x.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations