Publications

What is a Publication?
153 Publications visible to you, out of a total of 153

Abstract (Expand)

In dieser Arbeit wird Spearfish, eine neue Methode zur distanzbasierten Inferenz von Genbäumen, entwickelt und getestet. Spearfish verwendet die paarweisen Distanzen der Gensequenzen, sowie die Distanzen der zugehörigen Spezies im Speziesbaum, in einem Clustering-Verfahren, um 10 Genbäume zu rekonstruieren. Der beste wird anschließend mithilfe eines statistischen Evaluierungsverfahrens ausgewählt. Auf allen getesteten simulierten Datensätzen konnte gezeigt werden, dass die von Spearfish inferierten Bäume durchschnittlich eine Distanz von 0,213 zum echten Genbaum besitzen. Damit ist es 2,18-mal genauer als Methoden wie RAxML-NG, welche den Speziesbaum nicht berücksichtigen. Spearfish ist 25,85% ungenauer, aber 49,63% schneller als GeneRax, eine der führenden Methoden, die Genbäume mithilfe ihres Speziesbaumes korrigieren. So kann Spearfish verwendet werden, um Startbäume für GeneRax zu rekonstruieren oder bei goßen Datensätzen sogar zu ersetzen.

Authors: Lukas Knirsch, Benoit Morel, Alexandros Stamatakis

Date Published: 2nd Oct 2025

Publication Type: Bachelor's Thesis

Abstract

Not specified

Authors: Alexander Suhrkamp, Alexandros Stamatakis

Date Published: 1st Dec 2024

Publication Type: Master's Thesis

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone.We propose novel C++language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors.By exploiting C++’s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time.We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.

Authors: Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders

Date Published: 17th Nov 2024

Publication Type: Proceedings

Abstract

Not specified

Authors: Eric Laudemann, Alexandros Stamatakis

Date Published: 1st Oct 2024

Publication Type: Master's Thesis

Abstract

Not specified

Authors: Erik Borker, Alexandros Stamatakis

Date Published: 1st Sep 2024

Publication Type: Master's Thesis

Abstract (Expand)

In the field of population genetics, the driving forces of evolution within species can be studied with trees. Along a genome, each tree describes the local ancestries of a small genomic region. Together, those trees form a tree sequence that describes the ancestry of a population at every site of the sequence. Inferring tree sequences for whole genomes with many haplotype samples is a computationally expensive task, however. The state-of-the-art tool to infer tree sequences is tsinfer, which infers ancestries for human chromosomes from 5000 samples within a few hours. The tool has the capability to parallelize the computation, but we identify a structure in the input data that limits its parallelizability. We propose a novel parallelization scheme aiming to improve scaling at high thread counts, independently of this structure. Furthermore, we propose several optimizations for the inference algorithm, improving cache efficiency and reducing the number of operations per iteration. We provide a proof-of-concept implementation, and compare the computation speed of our implementation and tsinfer. When inferring ancestries for the 1000 Genomes Project, our implementation is consistently faster by a factor of 1.9 to 2.4. Additionally, depending on the choice of parameters, our parallelization scheme scales better between 32 and 96 cores, improving its speed advantage, especially at higher core counts. In phases where our novel parallelization scheme does not apply, our optimizations still improve the runtime by a factor of 2.2. As available genomic data sets are growing rapidly in size, our contribution decreases the computation time and enables better parallelization, allowing the processing of larger data sets in reasonable time frames

Authors: Johannes Hengstler, Lukas Hübner, Alexandros Stamatakis

Date Published: 1st Aug 2024

Publication Type: Journal

Abstract (Expand)

Maximum Likelihood (ML) based phylogenetic inference constitutes a challenging optimization problem. Given a set of aligned input sequences, phylogenetic inference tools strive to determine the treerive to determine the tree topology, the branch-lengths, and the evolutionary parameters that maximize the phylogenetic likelihood function. However, there exist compelling reasons to not push optimization to its limits, by means of early, yet adequate stopping criteria. Since input sequences are typically subject to stochastic and systematic noise, one should exhibit caution regarding (over-)optimization and the inherent risk of overfitting the model to noisy input data. To this end, we propose, implement, and evaluate four statistical early stopping criteria in RAxML-NG that evade excessive and compute-intensive (over-)optimization. These generic criteria can seamlessly be integrated into other phylo-genetic inference tools while not decreasing tree accuracy. The first two criteria quantify input data-specific sampling noise to derive a stopping threshold. The third, employs the Kishino-Hasegawa (KH) test to statistically assess the significance of differences between intermediate trees before , and after major optimization steps in RAxML-NG. The optimization terminates early when improvements are insignificant. The fourth method utilizes multiple testing correction in the KH test. We show that all early stopping criteria infer trees that are statistically equivalent compared to inferences without early stopping. In conjunction with a necessary simplification of the standard RAxML-NG tree search heuristic, the average inference times on empirical and simulated datasets are ∼3.5 and ∼1.8 times faster, respectively, than for standard RAxML-NG v.1.2. The four stopping criteria have been implemented in RAxML-NG and are available as open source code under GNU GPL at https://github.com/togkousa/raxml-ng .

Authors: Anastasis Togkousidis, Alexandros Stamatakis, Olivier Gascuel

Date Published: 8th Jul 2024

Publication Type: Journal

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH