An Effective Design to Improve the Efficiency of DPUs on FPGA

2020 
Convolutional neural networks (CNNs) have been widely used in various complicated problems, such as image classification, objection detection, semantic segmentation. To meet diversified CNN structures, the deep learning processing unit (DPU) is designed as a general accelerator on field programmable gate array (FPGA) to support various CNN layers, such as convolution, pooling, activation, etc. However, low DPU utilization and schedule efficiency appear when DPU used to multitask application completed by CNN models. In this paper, an effective design including multi-core with different size (MCDS) and DPU Plus is proposed to improve the efficiency of DPUs usage from the two dimensions of time and space. Through increasing the number of DPU cores on an FPGA and the utilization of single DPU core, the design of MCDS can effectively improve the overall throughput with restricted on-chip resources. Furthermore, the design of DPU Plus is proposed to improve the schedule efficiency of DPUs through simultaneously implementing DPU with other significant auxiliary modules of the application system on the same FPGA. Finally, a color space conversion module is implemented cooperate to the DPU cores to testify its performance, and the experimen shows that compared with running on the the CPU completely, it achieves16.2x acceleration, and increases the throughput of the entire system by 3.0x.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    1
    Citations
    NaN
    KQI
    []