Publications

What is a Publication?
18 Publications visible to you, out of a total of 18

Abstract (Expand)

In dieser Arbeit wird Spearfish, eine neue Methode zur distanzbasierten Inferenz von Genbäumen, entwickelt und getestet. Spearfish verwendet die paarweisen Distanzen der Gensequenzen, sowie die Distanzen der zugehörigen Spezies im Speziesbaum, in einem Clustering-Verfahren, um 10 Genbäume zu rekonstruieren. Der beste wird anschließend mithilfe eines statistischen Evaluierungsverfahrens ausgewählt. Auf allen getesteten simulierten Datensätzen konnte gezeigt werden, dass die von Spearfish inferierten Bäume durchschnittlich eine Distanz von 0,213 zum echten Genbaum besitzen. Damit ist es 2,18-mal genauer als Methoden wie RAxML-NG, welche den Speziesbaum nicht berücksichtigen. Spearfish ist 25,85% ungenauer, aber 49,63% schneller als GeneRax, eine der führenden Methoden, die Genbäume mithilfe ihres Speziesbaumes korrigieren. So kann Spearfish verwendet werden, um Startbäume für GeneRax zu rekonstruieren oder bei goßen Datensätzen sogar zu ersetzen.

Authors: Lukas Knirsch, Benoit Morel, Alexandros Stamatakis

Date Published: 2nd Oct 2025

Publication Type: Bachelor's Thesis

Abstract

Not specified

Authors: Alexander Suhrkamp, Alexandros Stamatakis

Date Published: 1st Dec 2024

Publication Type: Master's Thesis

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone.We propose novel C++language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors.By exploiting C++’s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time.We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.

Authors: Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders

Date Published: 17th Nov 2024

Publication Type: Proceedings

Abstract

Not specified

Authors: Eric Laudemann, Alexandros Stamatakis

Date Published: 1st Oct 2024

Publication Type: Master's Thesis

Abstract

Not specified

Authors: Erik Borker, Alexandros Stamatakis

Date Published: 1st Sep 2024

Publication Type: Master's Thesis

Abstract (Expand)

In the field of population genetics, the driving forces of evolution within species can be studied with trees. Along a genome, each tree describes the local ancestries of a small genomic region. Together, those trees form a tree sequence that describes the ancestry of a population at every site of the sequence. Inferring tree sequences for whole genomes with many haplotype samples is a computationally expensive task, however. The state-of-the-art tool to infer tree sequences is tsinfer, which infers ancestries for human chromosomes from 5000 samples within a few hours. The tool has the capability to parallelize the computation, but we identify a structure in the input data that limits its parallelizability. We propose a novel parallelization scheme aiming to improve scaling at high thread counts, independently of this structure. Furthermore, we propose several optimizations for the inference algorithm, improving cache efficiency and reducing the number of operations per iteration. We provide a proof-of-concept implementation, and compare the computation speed of our implementation and tsinfer. When inferring ancestries for the 1000 Genomes Project, our implementation is consistently faster by a factor of 1.9 to 2.4. Additionally, depending on the choice of parameters, our parallelization scheme scales better between 32 and 96 cores, improving its speed advantage, especially at higher core counts. In phases where our novel parallelization scheme does not apply, our optimizations still improve the runtime by a factor of 2.2. As available genomic data sets are growing rapidly in size, our contribution decreases the computation time and enables better parallelization, allowing the processing of larger data sets in reasonable time frames

Authors: Johannes Hengstler, Lukas Hübner, Alexandros Stamatakis

Date Published: 1st Aug 2024

Publication Type: Journal

Abstract (Expand)

Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.

Authors: Luise Häuser, Gerhard Jäger, Alexandros Stamatakis

Date Published: 28th Jun 2024

Publication Type: Proceedings

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH