An Efficient Accelerator for Deep Convolutional Neural Networks

2020 
Convolutional neural networks (CNN) in deep learning have become popular in many of the latest applications from speech recognition to image classification and object detection. Among them, YOLO (You only look once) is a well-known algorithm in object detection. YOLO convolutional neural networks require a lot of multiplication and accumulation calculations. On the edge, special hardware needs to be designed to speed up the calculation. In order to reduce hardware costs, a new distributed arithmetic (DA) architecture similar to NEDA is proposed. The multipliers is replaced by adders. The purpose is to reduce the cost of power consumption and area while maintaining high speed and high precision. Mathematical analysis proves that DA can only use addition to achieve multiplication in the form of two's complement, and then perform data shift at the end to implement the operation of the adder, not the multiplier. In addition, in this paper, after convolution, maximum pooling is performed to reduce the bandwidth. Finally, the biggest feature of this article is that PE can perform 1.78 MAC operations in one clock cycle.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []