Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A

2013 
In this work, we discuss the porting to the GPU platform of the latest production version of the Gyrokinetic Torodial Code (GTC), which is a petascale fusion simulation code using particle-in-cell method. New GPU parallel algorithms have been designed for the particle push and shift operations. The GPU version of the GTC code was benchmarked on up to 3072 nodes of the Tianhe-1A supercomputer, which shows about 2x–3x overall speedup comparing NVIDIA M2050 GPUs to Intel Xeon X5670 CPUs. Strong and weak scaling studies have been performed using actual production simulation parameters, providing insights into GTC’s scalability and bottlenecks on large GPU supercomputers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    8
    Citations
    NaN
    KQI
    []