Synthesis of program binaries into FPGA accelerators with runtime dependence validation

2017 
With the emergence of readily available FPGA cloud computing platforms, ease of use for application developers becomes increasingly crucial to widespread adoption. Synthesis directly from binaries has been proposed as an option to alleviate the design burden. However, in program binaries, loop bounds and loop invariants used for memory index calculation are often compiled into runtime data stored in registers or memories, making static loop dependence analysis infeasible. In this work, a two-phase approach is presented to address this issue with: 1) an offine phase to recover memory access patterns in the loop for data dependence analysis based on software profiling. and 2) an online phase to dynamically check for parallelization assertions. We use this method to discover and exploit coarse-grained parallelism for accelerating compute-intensive affine loops in binaries. With our target platform, the Zynq-7000 FPGA SoC, we ran and examined four benchmarks with our flow: GemsFDTD, Matrix Multiply, Sobel Edge Detection, and K-Nearest Neighbors. Results show up to 9.5x speedup with our flow compared to the pure software flow on the 667 MHz ARM Cortex A9 processor.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    23
    References
    0
    Citations
    NaN
    KQI
    []