The SEJITS framework supports creating embedded domain- specific languages (DSELs) and code generators, a pair of which is called a specializer, with much less effort than creating a full DSL compiler—typically just a few hundred lines of code. SEJITS’ main benefit is allowing application writers to stay entirely in high-level languages such as Python by using specialized Python functions (that is, functions written in one of the Python-embedded DSELs) to generate code that runs at native speed. One existing SEJITS DSEL is Sepya , a Python DSEL for stencil computations that generates OpenMP and Cilk+ code competitive with dedicated stencil DSL compilers such as Pochoir and Halide. We extend Sepya to generate OpenCL code for GPUs, and in the process, extend SEJITS with support for meta-specializers, whose job is to enable and optimize the composition of existing specializers written by third parties. In the case of Sepya, meta-specialization consists of detecting and removing extraneous data copies to and from the GPU when multiple stencils and related operations are composed. We also explore the variants of loop fusion to further improve performance of composed operations. The performance of the generated stencil code is 20× faster than SciPy and competitive with existing stencil DSELs on realistic code excerpts. Since meta-specializers must compose and optimize specializers created by third parties, we extend SEJITS with support for meta-specializer hooks, allowing existing specializers to be incrementally enabled for meta-specialization without breaking backwards compatibility. The Sepya and SEJITS extensions together extend the range of platforms for which highly optimized code can be generated and open new possibilities for optimizing the composition of existing specializers.
2D Accelerators Algorithms Architectures Arrays Big Data Bootstrapping C++ Cache Partitioning Cancer Careers Chisel Communication Computer Architecture CTF DIABLO Efficiency Energy FPGA GAP Gaussian Elimination Genomics GPU Hardware HLS Lower Bounds LU Matrix Multiplication Memory Multicore Oblivious Open Space OS Parallelism Parallel Reduction Performance PHANTOM Processors Python Research Centers RISC-V SEJITS Tall-Skinny QR Technical Report Test generation