We derive a new parallel communication-avoiding matrix
powers algorithm for matrices of the form A = D + USV H, where D is
sparse and USV H has low rank and is possibly dense. We demonstrate
that, with respect to the cost of computing k sparse matrix-vector multiplications,
our algorithm asymptotically reduces the parallel latency by
a factor of O(k) for small additional bandwidth and computation costs.
Using problems from real-world applications, our performance model
predicts up to 13× speedups on petascale machines.
(Appeared in Parallel Processing and Applied Mathematics (Springer series on Lecture Notes in Computer Science), 2014. Also tech report UCB/EECS-2013-47.)