CNN-DMA: A Predictable and Scalable Direct Memory Access Engine for Convolutional Neural Network with Sliding-window Filtering

2021 
Memory bandwidth utilization has become the key performance bottleneck for state-of-the-art variants of neural network kernels. Current structures such as depth-wise, point-wise and atrous convolutions have already introduced diverse and discontinuous memory access patterns, which impact efficient activation supply due to more frequent cache misses and consequently high-penalty DRAM pre-charging. To handle this, GPU achieves efficient parallelization with sophisticated optimization of CUDA program to reduce memory footprints, which demands high engineering efforts. In this work, we in contrast propose a programmable direct memory access engine for convolutional neural networks (CNN-DMA) supporting a fast supply of activation for independent and scalable computing units. The CNN-DMA favours a predictable activation streaming approach which completely avoids penalties by bus contention, cache misses and less carefully designed low-level programs. Furthermore, we enhance the baseline DMA with the capability of out-of-order data supply to filter out unique sliding-windows to boost the performance of the computing infrastructure. Experiments on state-of-the-art neural networks show that CNN-DMA achieves optimal DRAM access efficiency for point-wise convolution layers, while reduces 30% to 70% rounds of computation with sliding-window filtering.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    5
    References
    0
    Citations
    NaN
    KQI
    []