Heterogeneous Edge CNN Hardware Accelerator

2020 
We describe a programmable and scalable Convolutional Neural Network (CNN) hardware accelerator optimized for mobile and edge inference computing. The accelerator is comprised of 4 heterogeneous engines - input engine, filter engine, post processing engine, and output engine. The specialized engines execute independently and concurrently. All engines have a core set of common instructions with each engine further specialized for specific functions. We describe the operation of each engine and provide silicon validated results for a number of CNN networks including LeNet-5, TinySSD, and SqueezeNet. We describe a blind modulation detection application using SqueezeNet. The accelerator has been fabricated in 28nm CMOS and runs at 1GHz. The logic consumes 0.6 mm2 and the fully hardened core with 2MB of SRAM including built-in self-test consumes 9.36mm2. The accelerator’s filter engine implements 288 f16 multipliers thereby achieving 288 GFLOPS at 1GHz. Two TOPS of peak performance is achieved with all engines running in parallel. The accelerator including SRAM dissipates 193mW running LeNet-5 at room temperature.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    2
    Citations
    NaN
    KQI
    []