Synthesis of program binaries into FPGA accelerators with runtime dependence validation (Best Paper Award)

Synthesis directly from binaries has been proposed as an option to alleviate the design burden. However, in program binaries, loop bounds and loop invariants used for memory index calculation are often compiled into runtime data stored in registers or memories, making static loop dependence analysis infeasible. In this work, a two-phase approach is presented to address this issue with: 1) an offline phase to recover memory access patterns in the loop for data dependence analysis based on software profiling. and 2) an online phase to dynamically check for parallelization assertions. We use this method to discover and exploit coarse-grained parallelism for accelerating compute-intensive affine loops in binaries. With our target platform, the Zynq-7000 FPGA SoC, we ran and examined four benchmarks with our flow: GemsFDTD, Matrix Multiply, Sobel Edge Detection, and K-Nearest Neighbors. Results show up to 9.5x speedup with our flow compared to the pure software flow on the ARM Cortex A9 processor.