Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra

The traditional metric for the efficiency of a numerical algorithm has been

the number of arithmetic operations it performs. Technological trends have

long been reducing the time to perform an arithmetic operation, so it is no

longer the bottleneck in many algorithms; rather, communication, or moving

data, is the bottleneck. This motivates us to seek algorithms that move as

little data as possible, either between levels of a memory hierarchy or be-
tween parallel processors over a network. In this paper we summarize recent

progress in three aspects of this problem. First we describe lower bounds on

communication. Some of these generalize known lower bounds for dense clas-
sical (O(n3)) matrix multiplication to all direct methods of linear algebra, to

sequential and parallel algorithms, and to dense and sparse matrices. We also

present lower bounds for Strassen-like algorithms, and for iterative methods,

in particular Krylov subspace methods applied to sparse matrices. Second, we

compare these lower bounds to widely used versions of these algorithms, and

note that these widely used algorithms usually communicate asymptotically

more than is necessary. Third, we identify or invent new algorithms for most

linear algebra problems that do attain these lower bounds, and demonstrate

large speed-ups in theory and practice.