An FPGA-Based Reconfigurable CNN Accelerator for YOLO

2020 
Convolutional neural network (CNN) has been widely used in image processing fields. Object detection models based on CNN, such as YOLO and SSD, have been proved to be the most advanced in many applications. CNN have extremely high requirements on computing power and memory bandwidth, which usually needs to be deployed to a dedicated hardware platform. FPGA has great advantages in reconfigurability and performance power ratio, which is a suitable choice to deploy CNN. In this paper, we propose a reconfigurable CNN accelerator with AXI bus based on ARM + FPGA architecture. The accelerator can receive the configuration signals sent by ARM and complete the calculation during inference of different CNN layers through time-sharing. By combining convolution and pooling operation, the number of data moves of convolutional layer and pooling layer is reduced to reduce the number of off-chip memory accesses. The floating-point number is converted into 16-bit dynamic fixed-point format, which improves the calculation performance. We implemented the proposed architecture on the Xilinx ZCU102 FPGA for YOLOv2 and YOLOv2 Tiny models on COCO and VOC 2007 respectively, with peak performance of 289GOPs at 300MHz clock frequency.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    21
    References
    14
    Citations
    NaN
    KQI
    []