Core-Count Independent Reproducible Reduce

Abstract:

ecause of rounding errors, parallel floating-point summation can produce different results on different core-counts. For some algorithms like hill climbing, RAxML-NG [7] or greedy algorithms, this implies that results may be irreproducible with different core-counts. We present the Binary Tree Reduction algorithm, which follows a distributed binary tree scheme that keeps the calculation order fixed and independent of the core-count 푝. A naive implementation requires up to (푝 − 1) ∗ (log2 ( 푁 −1 푝 ) + 1) messages to sum 푁 floating-point numbers. To reduce the message count, we introduce a message buffer and optimize data distribution across the cores, the latter results in a runtime decrease of 18 %. We find that for 푝 = 256, Binary Tree Reduction has a slowdown of less than 2 compared to a naive, irreproducible solution. It is able to compute the sum of 푁 ≈ 21 ∗ 106 summands on 푝 = 256 cores in about 248 μs.

SEEK ID: https://publications.h-its.org/publications/1918

Filename: bachelorChristop.pdf 

Format: PDF document

Size: 994 KB

SEEK ID: https://publications.h-its.org/publications/1918

Research Groups: Computational Molecular Evolution

Publication type: Bachelor's Thesis

Citation:

Date Published: 1st Apr 2022

URL:

Registered Mode: manually

Authors: Christoph Stelz, Lukas Hübner, Alexandros Stamatakis

help Submitter
Activity

Views: 104   Downloads: 1

Created: 9th Jan 2025 at 13:27

help Tags

This item has not yet been tagged.

help Attributions

None

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH