ecause of rounding errors, parallel floating-point summation can produce different results on different core-counts. For some algorithms like hill climbing, RAxML-NG [7] or greedy algorithms, this implies that results may be irreproducible with different core-counts. We present the Binary Tree Reduction algorithm, which follows a distributed binary tree scheme that keeps the calculation order fixed and independent of the core-count 푝. A naive implementation requires up to (푝 − 1) ∗ (log2 ( 푁 −1 푝 ) + 1) messages to sum 푁 floating-point numbers. To reduce the message count, we introduce a message buffer and optimize data distribution across the cores, the latter results in a runtime decrease of 18 %. We find that for 푝 = 256, Binary Tree Reduction has a slowdown of less than 2 compared to a naive, irreproducible solution. It is able to compute the sum of 푁 ≈ 21 ∗ 106 summands on 푝 = 256 cores in about 248 μs.
SEEK ID: https://publications.h-its.org/publications/1918
Filename: bachelorChristop.pdf
Format: PDF document
Size: 994 KB
SEEK ID: https://publications.h-its.org/publications/1918
Research Groups: Computational Molecular Evolution
Publication type: Bachelor's Thesis
Views: 104 Downloads: 1
Created: 9th Jan 2025 at 13:27
This item has not yet been tagged.
None