Reproducibility, i.e. getting bitwise identical floating point results from multiple runs of the same program, is a property

that many users depend on either for debugging or correctness checking in many codes [10]. However, the combination of dynamic

scheduling of parallel computing resources, and floating point nonassociativity, makes attaining reproducibility a challenge even for

simple reduction operations like computing the sum of a vector of numbers in parallel. We propose a technique for floating point

summation that is reproducible independent of the order of summation. Our technique uses Rumpâ€™s algorithm for error-free vector

transformation [7], and is much more efficient than using (possibly very) high precision arithmetic. Our algorithm reproducibly computes

highly accurate results with an absolute error bound of n 2

## Publications

## Tags

2D
Accelerators
Algorithms
Architectures
Arrays
Big Data
Bootstrapping
C++
Cache Partitioning
Cancer
Careers
Chisel
Communication
Computer Architecture
CTF
DIABLO
Efficiency
Energy
FPGA
GAP
Gaussian Elimination
Genomics
GPU
Hardware
HLS
Lower Bounds
LU
Matrix Multiplication
Memory
Multicore
Oblivious
Open Space
OS
Parallelism
Parallel Reduction
Performance
PHANTOM
Processors
Python
Research Centers
RISC-V
SEJITS
Tall-Skinny QR
Technical Report
Test generation