The traditional metric for the efficiency of a numerical algorithm has been
the number of arithmetic operations it performs. Technological trends have
long been reducing the time to perform an arithmetic operation, so it is no
longer the bottleneck in many algorithms; rather, communication, or moving
data, is the bottleneck. This motivates us to seek algorithms that move as
little data as possible, either between levels of a memory hierarchy or be-
tween parallel processors over a network. In this paper we summarize recent
progress in three aspects of this problem. First we describe lower bounds on
communication. Some of these generalize known lower bounds for dense clas-
sical (O(n3)) matrix multiplication to all direct methods of linear algebra, to
sequential and parallel algorithms, and to dense and sparse matrices. We also
present lower bounds for Strassen-like algorithms, and for iterative methods,
in particular Krylov subspace methods applied to sparse matrices. Second, we
compare these lower bounds to widely used versions of these algorithms, and
note that these widely used algorithms usually communicate asymptotically
more than is necessary. Third, we identify or invent new algorithms for most
linear algebra problems that do attain these lower bounds, and demonstrate
large speed-ups in theory and practice.