As high level synthesis (HLS) moves towards main- stream adoption among FPGA designers, it has proven to be an effective method for rapid hardware generation. However, in the context of offloading compute intensive software kernels to FPGA accelerators, current HLS tools do not always take full advantage of the hardware platforms. In this paper, we present an automatic flow to refactor and restructure processor- centric software implementations, making them better suited for FPGA platforms. The methodology generates pipelines that decouple memory operations and data access from computation. The resulting pipelines have much better throughput due to their efficient use of the memory bandwidth and improved tolerance to data access latency. The methodology complements existing work in high-level synthesis, easing the creation of heterogeneous systems with high performance accelerators and general purpose processors. With this approach, for a set of non-regular algorithm kernels written in C, a performance improvement of 3.3 to 9.1x is observed over direct C-to-Hardware mapping using a state-of- the-art HLS tool.
2D Accelerators Algorithms Architectures Arrays Big Data Bootstrapping C++ Cache Partitioning Cancer Careers Chisel Communication Computer Architecture CTF DIABLO Efficiency Energy FPGA GAP Gaussian Elimination Genomics GPU Hardware HLS Lower Bounds LU Matrix Multiplication Memory Multicore Oblivious Open Space OS Parallelism Parallel Reduction Performance PHANTOM Processors Python Research Centers RISC-V SEJITS Tall-Skinny QR Technical Report Test generation