Write once, run in Python, Cilk, OpenMP, the cloud…

Error: Unable to create directory wp-content/uploads/2024/04. Is its parent directory writable by the server?

The SEJITS component of  ASPIRE is designed to dramatically simplify creating performant and energy-efficient code that can be retargeted to a variety of platforms.  Originally begun as part of the Par Lab, the SEJITS approach uses domain-specific languages embedded in Python to generate fast, efficient code for an underlying hardware platform.

Peter Birsinger, Richard Xia, Shoaib Kamil, and Armando Fox of ASPIRE will present a short paper at the ACM International Conference on Information and Knowledge Management (CIKM 2013) on “Scalable Bootstrapping for Python”.  The paper introduces a SEJITS specializer, or DSEL (domain-specific embedded language) compiler, for the Bag of Little Bootstraps (BLB), a recently developed bootstrapping algorithm designed for distributed environments.  We already had a BLB specializer that generated OpenMP or Cilk code for multicore CPUs; we’ve now extended it to generate code for Spark, a cluster-based MapReduce-like computing platform.  That means a data scientist can write a single, serial Python program that can run “toy” problems in plain Python, non-toy problems that fit on a single computer in OpenMP or Cilk with good parallel performance, and much larger problems with large datasets on a multi-computer Spark installation.

 sparkscalingngrams workflow

 

In this paper we evaluated the performance of the generated Spark BLB code on an email classifier for the Enron public email corpus and an  estimator of 2-gram word frequency ratios across different decades using data from the Google N-gram dataset (201 GB).  The experiments show strong scaling from 4 to 32 Amazon EC2 nodes (32 to 256 cores).

We are currently working with Dr. Gerald Friedland and others at ICSI to apply this specializer to multimedia classification problems.

Leave a Reply