Note: This paper just appeared as an invited “Research Highlight” in the latest CACM:
Feb 2014, v. 57, n. 02, pp 107-114.
ABSTRACT:
Algorithms have historically been evaluated in terms of the
number of arithmetic operations they performed. This analysis
is no longer sufficient for predicting running times on
today’s machines. Moving data through memory hierarchies
and among processors requires much more time (and energy)
than performing computations. Hardware trends suggest
the relative costs of this communication will only increase.
Proving lower bounds on the communication of algorithms
and finding algorithms that attain these bounds are therefore
fundamental goals. We show that the communication
cost of an algorithm is closely related to the graph expansion
properties of its corresponding computation graph.
Matrix multiplication is one of the most fundamental problems
in scientfic computing and in parallel computing. Applying
expansion analysis to Strassen’s and other fast matrix
multiplication algorithms, we obtain the rst lower bounds
on their communication costs. These bounds show that the
current sequential algorithms are optimal but that previous
parallel algorithms communicate more than necessary. Our
new parallelization of Strassen’s algorithm is communication optimal
and outperforms all previous matrix multiplication
algorithms
Publications
Tags
2D
Accelerators
Algorithms
Architectures
Arrays
Big Data
Bootstrapping
C++
Cache Partitioning
Cancer
Careers
Chisel
Communication
Computer Architecture
CTF
DIABLO
Efficiency
Energy
FPGA
GAP
Gaussian Elimination
Genomics
GPU
Hardware
HLS
Lower Bounds
LU
Matrix Multiplication
Memory
Multicore
Oblivious
Open Space
OS
Parallelism
Parallel Reduction
Performance
PHANTOM
Processors
Python
Research Centers
RISC-V
SEJITS
Tall-Skinny QR
Technical Report
Test generation