Memory latency optimizations for the elementary functions on the Sunway architecture

2019 
As fundamental software of high-performance computers, elementary functions have a significant impact on the performance of the high-level applications. Benefiting from the Chinese-designed manycore system consisting of processing cores and auxiliary cores, the Sunway TaihuLight supercomputer is considered as one of the fastest supercomputers in the world, having ranked on the top of the TOP500 supercomputer list several times. The processing cores of the Sunway architecture are coupled using a shared memory strategy, leading to high latency of memory accesses and performance degradation of the elementary functions where a variety of memory accesses exist. To address this issue, we propose a set of optimizations for memory latency of the Sunway processing cores. Firstly, we obtain a reduced data table in the context of guaranteed accuracy by optimizing underlying algorithms, grouping and mapping, removing error compensations, etc. Secondly, we perform data movement from the global memory shared by all processing cores to the scratchpad memory of individual processing cores, significantly reducing the memory latency. Finally, we convert the memory accesses that cannot be localized due to the limited space of the scratchpad memory into equivalent immediate loads and/or shift operators, further improving the performance. In addition, we automate the algorithm by carefully selecting the most suitable data conversion approach and table-lookup algorithm, mitigating the code explosion issue effectively. We implement our method and evaluate the effectiveness of the optimizations by conducting experiments on the Sunway architecture. The experimental results show that exponential functions can achieve performance improvements by 91 and 86.2% from the data movement and data conversion strategies.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    1
    Citations
    NaN
    KQI
    []