Publications

What is a Publication?
18 Publications visible to you, out of a total of 18

Abstract (Expand)

Accurately reconstructing the evolutionary history of a group of organism is a complex task. Current state-of-the-art tools produce phylogenetic tree distributions with Markov chain Monte-Carlo (MCMC) methods by sampling the posterior tree distribution under a given model to reflect uncertainties in the underlying models and data. While these distributions offer very good insight into the phylogenetic history, they are very compute intensive. In this thesis we present and evaluate multiple heuristics to approximate these distributions with distance-based methods. To judge the quality of our heuristics, we compare our distribution against a reference MCMC-based distribution with split and frequency-based metrics. We show that our method works well for some types of data, but not all, compared to other tools, and that further information about the data needs to be incorporated to make this viable in practice. Our most successful method is characterized by the use of pair-wise distance distributions to apply likelihood-supported perturbation to the input distances for the Neighbor Joining algorithm. Because this ignores the interdependencies between distances, we need to add parsimony filtering as a post-processing step to eliminate unlikely trees from our distributions, which significantly improves the results. Finally, we also discuss the shortcomings and future potential of our heuristics to more accurately estimate pair-wise distances and their interdependencies, which should lead to more competitive results.

Authors: Noah Wahl, Benoit Morel, Alexandros Stamatakis

Date Published: 1st Dec 2023

Publication Type: Master's Thesis

Abstract (Expand)

Methods for phylogenetic inference have been developed mainly for the reconstruction of evolutionary relationships of species based on biological sequence data. However, these methods are also made use of in linguistics for inferring phylogenies concerning the evolution of natural languages. In the scope of this thesis, we examine the corresponding linguistic input data. We conduct a case study on an exemplary morphosyntactic data set, examining various methods to analyze the signal it contains and to eliminate geographical information the data may include. Further, we perform analyses on numerous linguistic data sets collected from various sources and assembled in a database. We compare these data sets to morphological data from biology, considering differences in the behavior of phylogenetic inferences with RAxML-NG. Additionally, we investigate how it impacts the tree inferences, whether we represent a data set by a binary or by a multi-valued MSA. We study how to model subjectivity related with synonym selection in cognate data. We present probabilistic MSAs as a possible solution and show on an example data set that this might be an appropriate approach

Authors: Luise Häuser, Julia Haag, Alexandros Stamatakis

Date Published: 17th Jun 2023

Publication Type: Master's Thesis

Abstract (Expand)

One of the most fundamental unanswered questions that has been bothering mankind during the Anthropocene is whether the use of swearwords in open source code is positively or negatively correlated with source code quality. To investigate this profound matter we crawled and analysed over 3800 C open source code containing English swearwords and over 7600 C open source code not containing swearwords from GitHub. Subsequently, we quantified the adherence of these two distinct sets of source code to coding standards, which we deploy as a proxy for source code quality via the SoftWipe tool developed in our group. We find that open source code containing swearwords exhibit significantly better code quality than those not containing swearwords under several statistical tests. We hypothesise that the use of swearwords constitutes an indicator of a profound emotional involvement of the programmer with the code and its inherent complexities, thus yielding better code based on a thorough, critical, and dialectic code analysis process.

Authors: Jan Strehmel, Ben Bettisworth, Dimitri Höhler, Alexandros Stamatakis

Date Published: 1st Feb 2023

Publication Type: Bachelor's Thesis

Abstract (Expand)

ecause of rounding errors, parallel floating-point summation can produce different results on different core-counts. For some algorithms like hill climbing, RAxML-NG [7] or greedy algorithms, this implies that results may be irreproducible with different core-counts. We present the Binary Tree Reduction algorithm, which follows a distributed binary tree scheme that keeps the calculation order fixed and independent of the core-count 푝. A naive implementation requires up to (푝 − 1) ∗ (log2 ( 푁 −1 푝 ) + 1) messages to sum 푁 floating-point numbers. To reduce the message count, we introduce a message buffer and optimize data distribution across the cores, the latter results in a runtime decrease of 18 %. We find that for 푝 = 256, Binary Tree Reduction has a slowdown of less than 2 compared to a naive, irreproducible solution. It is able to compute the sum of 푁 ≈ 21 ∗ 106 summands on 푝 = 256 cores in about 248 μs.

Authors: Christoph Stelz, Lukas Hübner, Alexandros Stamatakis

Date Published: 1st Apr 2022

Publication Type: Bachelor's Thesis

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH