Online Fault Detection in ReRAM-Based Computing Systems for Inferencing

2022 
A resistive switching random access memory (ReRAM)-based computing system (RCS) provides an energy-efficient hardware implementation of vector–matrix multiplication for machine-learning hardware. However, it is susceptible to faults due to the immature resistive switching random access memory (ReRAM) fabrication process. We propose an efficient online fault-detection method for RCS. This method monitors the dynamic power consumption of each ReRAM crossbar and determines the occurrence of faults when a changepoint is detected in the monitored power-consumption time series. To estimate the percentage of faulty cells in a faulty ReRAM crossbar, we compute statistical features immediately before and after the changepoint and use them as independent variables; we use the percentage of faulty cells as dependent variables to train a predictive model using machine learning. In this way, the computationally expensive fault localization and error-recovery steps are carried out only when a high fault rate is estimated. Simulation results show that, with the fault-detection method and the predictive model, the test time is significantly reduced, the hardware overhead is negligible, and high classification accuracy for the MNIST and CIFAR-10 datasets using RCS can still be ensured.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []