Harnessing Numerical Flexibility for Deep Learning on FPGAs

2018 
Deep learning has become a key workload in the data centre and edge leading to an arms race for compute dominance in this space. FPGAs have shown they can compete by combining deterministic low-latency with high throughput and flexibility. In particular, due to FPGAs' bit-level programmability, FPGAs can efficiently implement arbitrary precisions and numeric data types which is critical to fast evolving fields like deep learning. In this work, we explore minifloat (floating point representations with non-standard exponent and mantissa sizes) implementations on the FPGA, and show how we use a block floating point implementation that shares the exponent across many numbers to reduce the required logic to perform floating point operations. We will show that using this technique we can significantly improve the performance of the FPGA with no impact to accuracy. Using this approach, we show how we can reduce logic utilization by 3x, and memory bandwidth and capacity required by more than 40%.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    1
    Citations
    NaN
    KQI
    []