Synthesis of program binaries into FPGA accelerators with runtime dependence validation

Shaoyi Cheng,Qijing Huang,John Wawrzynek

Synthesis of program binaries into FPGA accelerators with runtime dependence validation

2017

With the emergence of readily available FPGA cloud computing platforms, ease of use for application developers becomes increasingly crucial to widespread adoption. Synthesis directly from binaries has been proposed as an option to alleviate the design burden. However, in program binaries, loop bounds and loop invariants used for memory index calculation are often compiled into runtime data stored in registers or memories, making static loop dependence analysis infeasible. In this work, a two-phase approach is presented to address this issue with: 1) an offine phase to recover memory access patterns in the loop for data dependence analysis based on software profiling. and 2) an online phase to dynamically check for parallelization assertions. We use this method to discover and exploit coarse-grained parallelism for accelerating compute-intensive affine loops in binaries. With our target platform, the Zynq-7000 FPGA SoC, we ran and examined four benchmarks with our flow: GemsFDTD, Matrix Multiply, Sobel Edge Detection, and K-Nearest Neighbors. Results show up to 9.5x speedup with our flow compared to the pure software flow on the 667 MHz ARM Cortex A9 processor.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations