Superblock-based performance optimization for Sunway Math Library on SW26010 many-core processor

2021 
The SW26010 many-core processor is based on the Sunway architecture that is composed of management and computing processing elements (MPE and CPE, respectively), each of which is equipped with a stand-alone math library. The issue is that each Sunway Math Library (SML) version is written in assembly which is outside the power of compilers that take high-level languages as input; existing optimization approaches thus mainly rely on manual strategies, which are considered inefficient. In this paper, we leverage the concept of superblock scheduling, a well-known compilation technique, and present a tool named SMPOT to optimize the SML. SMPOT first builds a superblock using a novel tail duplication algorithm, and then uses code motion restrictions to avoid code compensation, followed by matching the machine model. Finally, it reorders instructions on the main path by an activation algorithm based on available computing resources. The experimental results show that SMPOT can effectively improve the performance of the SML. The main path performance of MPE functions is improved by 10.61% on average and overall performance by 5.40%. The main path performance of CPE functions is improved by 5.72% on average and overall performance by 2.98%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []