Detecting SDCs in GPGPUs Through an Efficient Instruction Duplication Mechanism

2021 
As General-Purpose Graphics Processing Units (GPGPUs) are widely used in High-Performance Computing (HPC) applications, the vulnerability of GPGPUs to soft errors becomes a critical concern. In this paper, we propose an efficient instruction duplication mechanism that merely duplicates SDC vulnerable instructions for reliability overhead saving. We first observe that the SDC proneness of individual instruction is related to its instruction type, fault propagation, and whether it affects shared memory. Then, leveraging these observed factors, we utilize machine learning to intelligently identify all the SDC vulnerable instructions of GPU applications and efficiently protect them. Experimental results show that our method achieves a 90.45% SDC coverage only duplicating 37.8% of static instructions, which achieves a significant improvement in terms of performance and SDC detection capability compared to the state-of-the-art duplication technique in GPUs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []