We derive a new parallel communication-avoiding matrix

powers algorithm for matrices of the form A = D + USV H, where D is

sparse and USV H has low rank and is possibly dense. We demonstrate

that, with respect to the cost of computing k sparse matrix-vector multiplications,

our algorithm asymptotically reduces the parallel latency by

a factor of O(k) for small additional bandwidth and computation costs.

Using problems from real-world applications, our performance model

predicts up to 13× speedups on petascale machines.

*(Appeared in Parallel Processing and Applied Mathematics (Springer series on Lecture Notes in Computer Science), 2014. Also tech report UCB/EECS-2013-47.)*