A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs

2020 
Convolutional neural networks (CNNs) have emerged as the critical technology for deep learning with significantly growing computation and memory demands. Model compression has been widely acknowledged as an effective way to achieve acceleration on CNNs. However, most proposed architectures of FPGA are inefficient for compressed models that contain a large amount of zero operations. In this work, we propose a sparse CNNs inference accelerator on FPGA utilizing uniform sparsity introduced by pattern pruning to achieve high energy efficiency. Our architecture maintains the sparse weights in a compressed format to reduce the storage demands and displays a flexible kernel-stationary dataflow to enable the extensive data reusing. In addition, we design flexible computing arrays which can be dynamically reconfigured to balance workload with low overheads. Specially, the on-chip memory applies a novel data buffering structure with slightly rearranged sequences to address the challenge of access conflict. The experiments show that our accelerator can achieve $316 .4 {GOP/s }\sim 343 .5 {GOP/s }$ for VGG16 and ResNet-50.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    3
    References
    2
    Citations
    NaN
    KQI
    []