A Kernel Unfolding Approach to Trade Data Movement with Computation Power for CNN Acceleration

2020 
Convolutional neural networks (CNN) achieves human-level accuracy on the image classification applications. However, its complicated structure brings the large requirement on the MAC operations and result in huge cost on the data movement. In addition, this situation becomes worse when the asymmetric growth of the computing power and memory speed happens on the von Neumann-based architecture. Recently, processing in memory (PIM) design is adopted to reduce the data communication cost by storing parameters into memory. However, significant cost on feeding input feature map is a big challenge, especially for high bandwidth but long access latency PIM devices. Thus, we explore an idea that how to trade the space in PIM to eliminate such cost. A kernel unfolding technique is proposed to eliminate the duplicated feeding on input feature map, and meanwhile, memory cells in PIM are highly utilized to achieve peak computing throughput. Thus, the memory bandwidth could be utilized efficiently and the corresponding execution time could be reduced significantly. The results show that the proposed design could achieve up to 16.2× cycle improvement compared to traditional PIM designs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []