PACA: A Pattern Pruning Algorithm and Channel-Fused High PE Utilization Accelerator for CNNs

2022 
In recent years, convolutional neural networks (CNNs) have achieved significant advancements in various fields. However, the computation and storage overheads of CNNs are overwhelming for Internet-of-Things devices. Both network pruning algorithms and hardware accelerators have been introduced to empower CNN inference at the edge. Network pruning algorithms reduce the size and computational cost of CNNs by regularizing unimportant weights to zeros. However, existing works lack intrakernel structured types to tradeoff between sparsity and hardware efficiency, and the index storage for irregularly pruned networks is significant. Hardware accelerators leverage the sparsity of pruned CNNs to improve energy efficiency. However, their process element (PE) utilization rate is low because of uneven sparsity among input convolutional kernels. To overcome these problems, we propose PACA: a Pattern pruning Algorithm and Channel-fused high PE utilization Accelerator for CNNs. It includes three parts: a pattern pruning algorithm to explore the intrakernel sparsity type and reduce the index storage, a channel-fused hardware architecture to reduce the PEs’ idle rate and improve the performance, and a heuristic and taboo search-based smart fusion scheduler to analyze the idle PE problem and schedule the channel fusion in hardware. To demonstrate the effectiveness of PACA, we have implemented the software parts by Python and the hardware architecture by RTL codes. Experimental results on various datasets show that compared with an existing work, PACA can reduce the index storage overhead by $3.47\times $ $5.63\times $ with 3.85–9.12 average patterns, and it can improve the hardware performance by $2.01\times $ $5.53\times $ because of PEs’ idle rate reduction.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    29
    References
    0
    Citations
    NaN
    KQI
    []