CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters

2021 
Convolutional neural networks (CNNs) are at the core of many state-of-the-art deep learning models in computer vision, speech, and text processing. Training and deploying such CNN-based architectures usually require a significant amount of computational resources. Sparsity has emerged as an effective compression approach for reducing the amount of data and computation for CNNs. However, sparsity often results in computational irregularity, which prevents accelerators from fully taking advantage of its benefits for performance and energy improvement. In this paper, we propose CSCNN, an algorithm/hardware co-design framework for CNN compression and acceleration that mitigates the effects of computational irregularity and provides better performance and energy efficiency. On the algorithmic side, CSCNN uses centrosymmetric matrices as convolutional filters. In doing so, it reduces the number of required weights by nearly 50% and enables structured computational reuse without compromising regularity and accuracy. Additionally, complementary pruning techniques are leveraged to further reduce computation by a factor of $2.8-7.2\times $ with a marginal accuracy loss. On the hardware side, we propose a CSCNN accelerator that effectively exploits the structured computational reuse enabled by centrosymmetric filters, and further eliminates zero computations for increased performance and energy efficiency. Compared against a dense accelerator, SCNN and SparTen, the proposed accelerator performs $3.7\times $, $1.6\times $ and $1.3\times $ better, and improves the EDP (Energy Delay Product) by $8.9\times $, $2.8\times $ and $2.0\times $, respectively.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    89
    References
    0
    Citations
    NaN
    KQI
    []