Speaker: Michael Anderson
Title: A Framework for Composing High-Performance OpenCL from Python Descriptions
Abstract: We would like to write programs in productivity languages such as Python or MATLAB, and achieve performance comparable to the best hand-tuned code. One approach toward achieving this ideal is to write libraries that get high efficiency on certain operations, and call these libraries from the productivity environment. However, this approach produces poor performance because it fails to fuse operations for efficiency, and it may not consider runtime information such as shapes and sizes.
In this talk, I will present Hindemith, a performance framework, programmable from Python, for automatically composing custom data parallel kernels. In this approach, efficiency programmers write and/or generate customized OpenCL snippets at runtime and Hindemith automatically fuses, compiles, and executes these operations based on a Python description. The efficiency programmer is able to leverage runtime information such as shapes and sizes of data structures. Hindemith achieves state-of-the-art performance on two very different applications. For a space-time adaptive radar processing application, our framework’s implementation is competitive with a hand-coded implementation using NVIDIA’s CUBLAS library. For optical flow, a computer vision application, the framework achieves between 0.5x and 0.97x hand-coded OpenCL performance.