This paper describes a general framework for transforming a sequential program into a network of processes, which are then converted to hardware accelerators through high level synthesis. Also proposed is a complementing technique for performing static deadlock analysis of the generated accelerator network. The interactions between the accelerators’ schedules, the capacity of the communication channels in the network and the memory access mechanisms are all incorporated into our model, such that potential artificial deadlocks can be detected and resolved a priori. An algorithm optimized for FPGA implementation is developed and applied through our transformation framework. A set of irregular computation kernels are converted into networks of FPGA accelerators. Compared to hardware accelerators generated without our transformation, the accelerator networks achieve significantly better performance.
Publications
Tags
2D
Accelerators
Algorithms
Architectures
Arrays
Big Data
Bootstrapping
C++
Cache Partitioning
Cancer
Careers
Chisel
Communication
Computer Architecture
CTF
DIABLO
Efficiency
Energy
FPGA
GAP
Gaussian Elimination
Genomics
GPU
Hardware
HLS
Lower Bounds
LU
Matrix Multiplication
Memory
Multicore
Oblivious
Open Space
OS
Parallelism
Parallel Reduction
Performance
PHANTOM
Processors
Python
Research Centers
RISC-V
SEJITS
Tall-Skinny QR
Technical Report
Test generation