ASPIRE “BEST OF” Papers

Highlights of the ASPIRE Literature

The ASPIRE project produced a wealth of publications over the course of five years.  While a more complete list ASPIRE-sponsored papers can be found elsewhere, we present some of the more significant ones here, organized by topic.


RISC-V Architecture and Implementations

 

The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. 

By Andrew Waterman, Yunsup Lee, David Patterson, and Krste Asanović.  May 2014.

The Rocket Chip Generator

By Krste Asanović, Rimas Avižienis, Jonathan Bachrach, Scott Beamer, David Biancolin,
Christopher Celio, Henry Cook, Palmer Dabbelt, John Hauser, Adam Izraelevitz,
Sagar Karandikar, Benjamin Keller, Donggyu Kim, John Koenig, Yunsup Lee, Eric Love,
Martin Maas, Albert Magyar, Howard Mao, Miquel Moreto, Albert Ou, David Patterson,
Brian Richards, Colin Schmidt, Stephen Twigg, Huy Vo, Andrew Waterman. 

 

The Berkeley Out-of-Order Machine (BOOM): An Industry-Competitive, Synthesizable, Parameterized RISC-V Processor

By Christopher Celio, David Patterson, and Krste Asanović.

 

The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V.

By Christopher Celio, Palmer Dabbelt, David Patterson, and Krste Asanović.

 


Architecture: Hardware Design, Chisel, HLS

 

Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL

By Donggyu Kim, Adam Izraelevitz, Christopher Celio, Hokeun Kim, Brian Zimmer, Yunsup Lee, Jonathan Bachrach, and Krste Asanović.  ISCA 2016.

 

Architectural Synthesis of Computational Pipelines with Decoupled Memory Access

By Shaoyi Cheng and John Wawrzynek.  FPGA 2014.

 

Hardware Reusability is FIRRTL Ground: Hardware Construction Languages, Compiler Frameworks, and Transformations

By Adam Izraelevitz, Jack Koenig, Patrick S. Li, Richard Lin, Angie Wang, Albert Magyar, Donggyu Kim, Colin Schmidt, Chick Markley, Jim Lawson, Jonathan Bachrach

 

Specification for the FIRRTL Language

By Patrick S. Li, Adam M. Izraelevitz and Jonathan Bachrach

 


Circuits: Raven, Hurricane, CRAFT, RRAM, Silicon Photonics

 

Single-chip microprocessor that communicates directly using light

By Chen Sun, Mark T. Wade, Yunsup Lee1, Jason S. Orcutt, Luca Alloatti, Michael S. Georgas, Andrew S. Waterman, Jeffrey M. Shainline, Rimas R. Avižienis, Sen Lin, Benjamin R. Moss, Rajesh Kumar, Fabio Pavanello, Amir H. Atabaki , Henry M. Cook, Albert J. Ou, Jonathan C. Leu, Yu-Hsin Chen, Krste Asanović, Rajeev J. Ram, Miloš A. Popović, and Vladimir M. Stojanović.

 

A RISC-V Vector Processor with Tightly-Integrated Switched-Capacitor DC-DC Converters in 28nm FDSOI

By Brian Zimmer, Yunsup Lee, Alberto Puggelli, Jaehwa Kwak, Ruzica Jevtic, Ben Keller, Stevo Bailey,
Milovan Blagojevic, Pi-Feng Chiu, Hanh-Phuc Le, Po-Hung Chen, Nicholas Sutardja, Rimas Avizienis,
Andrew Waterman, Brian Richards, Philippe Flatresse, Elad Alon, Krste Asanović, and Borivoje Nikolić.  Symposium on VLSI Circuits, 2015.

 

Reprogrammable redundancy for cache Vmin reduction in a 28nm RISC-V processor

By Brian Zimmer, Pi-Feng Chiu, Borivoje Nikolic, and Krste Asanović.  IEEE Asian Solid-State Circuits Conference (ASSCC).

 

A Double-Tail Sense Amplifier for Low-Voltage SRAM in 28nm Technology

By Pi-Feng Chiu, Brian Zimmer, and Borivoje Nikolic.  IEEE Asian Solid-State Circuits Conference (ASSCC).

 


Hardware Security:

PHANTOM: Practical Oblivious Computation in a Secure Processor

By Martin Maas, Eric Love, Emil Stefanov, Mohit Tiwari, Elaine Shi, Krste Asanovic, John Kubiatowicz, and Dawn Song.  ACM Conference on Computer and Communications Security (CCS ’13)

 

GhostRider: A Hardware-Software System for Memory Trace Oblivious Computation (Best Paper Award)

By Chang Liu, Austin Harris, Martin Maas, Michael Hicks, Mohit Tiwari, and Elaine Shi.  ASPLOS 2015.

 


Communication-Avoiding Algorithms:

 

Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays

By Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon, and Katherine Yelick.

 

Write-Avoiding Algorithms

By Erin Carson, James Demmel, Laura Grigori, Nicholas Knight, Penporn Koanantakool, Oded Schwartz, and Harsha Vardhan Simhadri.  IPDPS 2016.

 

Perfect Strong Scaling Using No Additional Energy

By James Demmel, Andrew Gearhart, Benjamin Lipshitz, and Oded Schwartz.  IPDPS 2013.

 

Precimonious: tuning assistant for floating-point precision

By Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, and David Hough.  SC’13.

 

Efficient Reproducible Floating Point Summation and BLAS

By Peter Ahrens, Hong Diep Nguyen and James Demmel.

 

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems

By Yang You, James Demmel, Kenneth Czechowski, Le Song, and Richard Vuduc.  IPDPS 2015 (Best Paper).

 

A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of $s$-Step Krylov Subspace Methods

By Erin Carson and James Demmel.  SIAM Journal on Matrix Analysis and Applications.

 


Graph Processing:

 

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server

By Scott Beamer, Krste Asanović, and David Patterson.

 

Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search

By Scott Beamer, Aydın Buluç, Krste Asanović, and David Patterson.

 


Machine Learning and Computer Vision:

 

SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size

By Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer.  ICLR 2017.

 

Keynote ESWEEK 2017: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures

By Forrest Iandola and Kurt Keutzer.

 

SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

By Bichen Wu, Alvin Wan, Forrest Iandola, Peter H. Jin, and Kurt Keutzer.

 

FireCaffe: near-linear acceleration of deep neural network training on compute clusters

By Forrest Iandola, Matthew W. Moskewicz, Khalid Ashraf, and Kurt Keutzer.

 


SEJITS and Specializers:

An Extensible Framework for Composing Stencils with Common Scientific Computing Patterns

By Leonard Truong, Chick Markley, and Armando Fox.  Workshop on Stencil Computations 2014.

 

Snowflake: A Lightweight Portable Stencil DSL

By Nathan Zhang, Michael Driscoll, Armando Fox, Charles Markley, Samuel Williams, and Protonu Basu.

 

Latte: A Language, Compiler, and Runtime for Elegant and Efficient Deep Neural Networks

By Leonard Truong, Rajkishore Barik, Ehsan Totoni, Hai Liu, Chick Markley, Armando Fox, and Tatiana Shpeisman.