Data Optimization CNN Accelerator Design on FPGA

2019 
Image understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with power constraints and tight real-time. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Modern high-end FPGA generations feature hundreds of thousands of configurable logic blocks, and additionally include an abundance of hardened functional units which enable fast and efficient implementations of common functions. Many researchers have proposed their CNN accelerator prototypes on FPGA. But one problem of the stateof-the-art designs is that they have not solved the data dependence problem well. Data dependency is an important factor affecting accelerator performance. Current designs solve data dependence problem by adding hardware module on FPGA. But this approach has little effect and leads to increased hardware complexity. In this paper, we propose an optimization on the data arrangement in CNN. Which solves the data dependence in CNN by rearranging the data. The rearranged data is stored in a hardware-friendly form. By this way, our accelerator can apply pipeline technology better than current designs. We validate our approach on Xilinx Zynq XC-7Z045 board. The experimental results show that our approach has obvious advantages in terms of hardware resource consumption and bandwidth compare to state-of-the-art designs.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    2
    Citations
    NaN
    KQI
    []