Publications

What is a Publication?
13 Publications matching the given criteria: (Clear all filters)
Published year: 202413

Abstract

Not specified

Authors: Alexander Suhrkamp, Alexandros Stamatakis

Date Published: 1st Dec 2024

Publication Type: Master's Thesis

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone.We propose novel C++language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors.By exploiting C++’s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time.We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.

Authors: Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders

Date Published: 17th Nov 2024

Publication Type: Proceedings

Abstract

Not specified

Authors: Eric Laudemann, Alexandros Stamatakis

Date Published: 1st Oct 2024

Publication Type: Master's Thesis

Abstract

Not specified

Authors: Erik Borker, Alexandros Stamatakis

Date Published: 1st Sep 2024

Publication Type: Master's Thesis

Abstract (Expand)

In the field of population genetics, the driving forces of evolution within species can be studied with trees. Along a genome, each tree describes the local ancestries of a small genomic region. Together, those trees form a tree sequence that describes the ancestry of a population at every site of the sequence. Inferring tree sequences for whole genomes with many haplotype samples is a computationally expensive task, however. The state-of-the-art tool to infer tree sequences is tsinfer, which infers ancestries for human chromosomes from 5000 samples within a few hours. The tool has the capability to parallelize the computation, but we identify a structure in the input data that limits its parallelizability. We propose a novel parallelization scheme aiming to improve scaling at high thread counts, independently of this structure. Furthermore, we propose several optimizations for the inference algorithm, improving cache efficiency and reducing the number of operations per iteration. We provide a proof-of-concept implementation, and compare the computation speed of our implementation and tsinfer. When inferring ancestries for the 1000 Genomes Project, our implementation is consistently faster by a factor of 1.9 to 2.4. Additionally, depending on the choice of parameters, our parallelization scheme scales better between 32 and 96 cores, improving its speed advantage, especially at higher core counts. In phases where our novel parallelization scheme does not apply, our optimizations still improve the runtime by a factor of 2.2. As available genomic data sets are growing rapidly in size, our contribution decreases the computation time and enables better parallelization, allowing the processing of larger data sets in reasonable time frames

Authors: Johannes Hengstler, Lukas Hübner, Alexandros Stamatakis

Date Published: 1st Aug 2024

Publication Type: Master's Thesis

Abstract (Expand)

Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.

Authors: Luise Häuser, Gerhard Jäger, Alexandros Stamatakis

Date Published: 28th Jun 2024

Publication Type: Proceedings

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing and algorithmic research in the field of distributed-memory computing, but MPI only provides C and Fortran bindings. This provides good language interoperability, but higher-level programming languages make development quicker and less error-prone. We propose novel C++ language bindings designed to cover the whole range of abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of the full parameter set. This allows for both rapid prototyping and fine-tuning of distributed code with predictable runtime behavior and memory management. Using template-metaprogramming, only code paths required for computing missing parameters are generated at compile time, which results in (near) zero-overhead bindings.

Authors: Kunal Agrawal, Erez Petrank, Demian Hespe, Lukas Hübner, Florian Kurpicz, Peter Sanders, Matthias Schimek, Daniel Seemaier, Tim Niklas Uhl

Date Published: 17th Jun 2024

Publication Type: Proceedings

Powered by
(v.1.16.0)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH