*Note: This paper just appeared as an invited “Research Highlight” in the latest CACM:
Feb 2014, v. 57, n. 02, pp 107-114.
*

ABSTRACT:

Algorithms have historically been evaluated in terms of the

number of arithmetic operations they performed. This analysis

is no longer sufficient for predicting running times on

today’s machines. Moving data through memory hierarchies

and among processors requires much more time (and energy)

than performing computations. Hardware trends suggest

the relative costs of this communication will only increase.

Proving lower bounds on the communication of algorithms

and finding algorithms that attain these bounds are therefore

fundamental goals. We show that the communication

cost of an algorithm is closely related to the graph expansion

properties of its corresponding computation graph.

Matrix multiplication is one of the most fundamental problems

in scientfic computing and in parallel computing. Applying

expansion analysis to Strassen’s and other fast matrix

multiplication algorithms, we obtain the rst lower bounds

on their communication costs. These bounds show that the

current sequential algorithms are optimal but that previous

parallel algorithms communicate more than necessary. Our

new parallelization of Strassen’s algorithm is communication optimal

and outperforms all previous matrix multiplication

algorithms