Reconstructing Householder Vectors from Tall-Skinny QR

The Tall-Skinny QR (TSQR) algorithm is more communication effi_cient than the standard

Householder algorithm for QR decomposition of matrices with many more rows than columns.

However, TSQR produces a different representation of the orthogonal factor and therefore

requires more software development to support the new representation. Further, implicitly

applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix

is more complicated and costly than with the Householder representation.

We show how to perform TSQR and then reconstruct the Householder vector representation with

the same asymptotic communication effi_ciency and little extra computational cost. We demonstrate

the high performance and numerical stability of this algorithm both theoretically and empirically. The

new Householder reconstruction algorithm allows us to design more effi_cient parallel QR algorithms,

with significantly lower latency cost compared to Householder QR and lower bandwidth and latency

costs compared with Communication-Avoiding QR (CAQR) algorithm. As a result, our final parallel

QR algorithm outperforms ScaLAPACK and Elemental implementations of Householder QR and our

implementation of CAQR on the Hopper Cray XE6 NERSC system.


Reconstructing Householder Vectors from Tall-Skinny QR