FPRA: a fine-grained parallel RRAM architecture

2021 
Emerging resistive memory (RRAM) based crossbar array is a promising technology to accelerate neural network applications. RRAM-based CNN accelerators support a high-degree of intra-layer and inter-layer parallelism. The intra-layer parallelism duplicates kernels for each network layer while the inter-layer parallelism allows execution of each layer when a portion of input data is available. However, previously proposed RRAM-based accelerators do not leverage data sharing between duplicate kernels leading to significant idleness of crossbar arrays during inference. This shared data creates data dependencies that stall the processing of the next layer in the pipeline. To address these issues, we propose Fine-grained Parallel RRAM Architecture (FPRA), a novel architectural design, to improve parallelism for pipeline-enabled RRAM-based accelerators. FPRA addresses the data sharing issue with kernel batching and data sharing aware memory. Kernel batching rearranges the layout of the kernels and minimizes the data dependencies created by the input shared data. The data sharing aware memory uniformly buffers the input and output data for each layer, efficiently dispatching data to duplicate kernels while reducing the amount of data transferred between layers. We evaluate FPRA on eight popular image recognition CNN models with various configurations in a cycle-accurate simulator. We find that FPRA manages to achieve 2.0 $\times$ average latency speedup, and 2.1 $\times$ average throughput increase, as compared to the state-of-the-art RRAM-based accelerators.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []