A Case for OS-Friendly Hardware Accelerators

Abstract

Modern SoCs make extensive use of specialized hardware accelerators to meet the demanding energy-efficiency requirements of demanding applications, such as computer graphics and video encoding/decoding. Unfortunately, the state of the art is a sea of heterogeneous fixed-function processing units wired together in an ad-hoc fashion, with dedicated memory spaces and a wide variety of host-accelerator synchronization mechanisms. This cumbersome approach complicates acceleration of a mix of multi-programmed applications running on a conventional operating system, and adds considerable communication overhead that reduces achievable speedups on a wide range of applications. We propose that accelerators should adopt a more standardized OS-friendly interface, to ease integration and improve performance on a wider range of code. Our framework standardizes the host-accelerator communication interface, provides a memory consistency model, and specifies the minimal requirements for virtual memory support. To evaluate the feasibility of our proposal, we conduct a case study in which we modify an existing data-parallel accelerator. When the integrated accelerator and processor system is pushed through TSMC’s 45nm process, we observe that the overhead is only 1.8% in area and 2.5% in energy, illustrating that the overhead in building OS-friendly accelerators can be minimal.

A Case for OS-Friendly Hardware Accelerators