## Publications

- Avoiding Communication in Primal and Dual Block Coordinate Descent Methods
- Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting
- Efficient Reproducible Floating Point Summation and BLAS
- Efficient Reproducible Floating Point Summation and BLAS
- Reproducible Tall-Skinny QR
- CA-SVM: Communication-Avoiding Parallel Support Vector Machines on Distributed Systems
- Matrix Multiplication Algorithm Selection with Support Vector Machines
- Write-Avoiding Algorithms
- FRPA: A Framework for Recursive Parallel Algorithms
- Matrix Multiplication Algorithm Selection with Support Vector Machines
- Reconstructing Householder Vectors from Tall-Skinny QR
- Communication Avoiding Rank Revealing QR Factorization with Column Pivoting
- Contracting Symmetric Tensors Using Fewer Multiplications
- Communication Lower Bounds for Tensor Contraction Algorithms
- A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-step Krylov Subspace Methods
- S-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid
- A Massively Parallel Tensor Contraction Framework for Coupled-Cluster Computations
- Accuracy of the S-Step Lanczos Method for the Symmetric Eigenproblem
- Contention Bounds for Combinations of Computation Graphs and Network Topologies
- Parallel Reproducible Summation
- Avoiding Communication in Successive Band Reduction
- Communication Lower Bounds and Optimal Algorithms for Numerical Linear Algebra
- Analysis of the finite precision s-step biconjugate gradient method
- Error Analysis of the s-step Lanczos Process in Finite Precision
- Exploiting Data Sparsity in Parallel Matrix Powers Computations
- Communication-Avoiding Symmetric Indefinite Factorization
- Reconstructing Householder Vectors from Tall-Skinny QR
- Tradeoffs between synchronization, communication, and work in parallel linear algebra computations
- Perfect Strong Scaling Using No Additional Energy
- Precimonious: Tuning Assistant for Floating-Point Precision
- Communication Costs of Strassenâ€™s Matrix Multiplication
- Communication Lower Bounds and Optimal Algorithms for Programs That Reference Arrays | Part 1
- Communication Efficient Gaussian Elimination with Partial Pivoting using a Shape Morphing Data Layout
- Communication Optimal Parallel Multiplication of Sparse Random Matrices
- Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
- Implementing a Blocked Aasenâ€™s Algorithm with a Dynamic Scheduler on Multicore Architectures (Best Paper Award)
- Cyclops Tensor Framework: reducing communication and eliminating load imbalance in massively parallel contractions
- Minimizing communication in all-pairs shortest-paths