Parallel Reproducible Summation

Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property
that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic
scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for
simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point
summation that is reproducible independent of the order of summation. Our technique uses Rump’s algorithm for error-free vector
transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes
highly accurate results with an absolute error bound of n  2