A fully pipelined hardware architecture for convolutional neural network with low memory usage and DRAM bandwidth

2017 
As a typical deep learning model, Convolutional Neural Network (CNN) has shown excellent ability in solving complex classification problems. To apply CNN models in mobile ends and wearable devices, a fully pipelined hardware architecture adopting a Row Processing Tree (RPT) structure with small memory resource consumption between convolutional layers is proposed. A modified Row Stationary (RS) dataflow is implemented to evaluate the RPT architecture. Under the the same work frequency requirement for these two architectures, the experimental results show that the RPT architecture reduces 91% on-chip memory and 75% DRAM bandwidth compared with the modified RS dataflow, but the throughput of the modified RS dataflow is 3 times higher than the our proposed RPT architecture. The RPT architecture can achieve 121fps at 100MHZ while processing a CNN including 4 convolutional layers.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    2
    Citations
    NaN
    KQI
    []