Accurately reconstructing the evolutionary history of a group of organism is a complex task. Current state-of-the-art tools produce phylogenetic tree distributions with Markov chain Monte-Carlo (MCMC) methods by sampling the posterior tree distribution under a given model to reflect uncertainties in the underlying models and data. While these distributions offer very good insight into the phylogenetic history, they are very compute intensive. In this thesis we present and evaluate multiple heuristics to approximate these distributions with distance-based methods. To judge the quality of our heuristics, we compare our distribution against a reference MCMC-based distribution with split and frequency-based metrics. We show that our method works well for some types of data, but not all, compared to other tools, and that further information about the data needs to be incorporated to make this viable in practice. Our most successful method is characterized by the use of pair-wise distance distributions to apply likelihood-supported perturbation to the input distances for the Neighbor Joining algorithm. Because this ignores the interdependencies between distances, we need to add parsimony filtering as a post-processing step to eliminate unlikely trees from our distributions, which significantly improves the results. Finally, we also discuss the shortcomings and future potential of our heuristics to more accurately estimate pair-wise distances and their interdependencies, which should lead to more competitive results.
SEEK ID: https://publications.h-its.org/publications/1914
Filename: thesisNoah.pdf
Format: PDF document
Size: 1.02 MB
SEEK ID: https://publications.h-its.org/publications/1914
Research Groups: Computational Molecular Evolution
Publication type: Master's Thesis
Views: 78 Downloads: 1
Created: 9th Jan 2025 at 13:13
This item has not yet been tagged.
None