Publications

What is a Publication?
135 Publications visible to you, out of a total of 135

Abstract (Expand)

Phylogenetic trees represent hypothetical evolutionary relationships between organisms. Approaches for inferring phylogenetic trees include the Maximum Likelihood (ML) method. This method relies on numerical optimization routines that use internal numerical thresholds. We analyze the influence of these thresholds on the likelihood scores and runtimes of tree inferences for the ML inference tools RAxML-NG, IQ-Tree, and FastTree. We analyze 22 empirical datasets and show that we can speed up the tree inference in RAxML-NG and IQ-Tree by changing the default values of two such numerical thresholds. Using 15 additional simulated datasets, we show that these changes do not affect the accuracy of the inferred phylogenetic trees. For RAxML-NG, increasing the likelihood thresholds lh_epsilon and spr_lh_epsilon to 10 and 103 respectively results in an average speedup of 1.9 ± 0.6. Increasing the likelihood threshold lh_epsilon in IQ-Tree results in an average speedup of 1.3 ± 0.4. In addition to the numerical analysis, we attempt to predict the difficulty of datasets, with the aim of preventing an unnecessarily large number of tree inferences for datasets that are easy to analyze. We present our prediction experiments and discuss why this task proved to be more challenging than anticipated.

Author: Julia Haag

Date Published: No date defined

Publication Type: Master's Thesis

Abstract (Expand)

Phylogenetics, the study of evolutionary relationships among biological entities, plays an essential role in biological and medical research. Its applications range from answering fundamental questions, such as understanding the origin of life, to solving more practical problems, such as tracking pandemics in real time. Nowadays, phylogenetic trees are typically inferred from molecular data, via likelihood-based methods. Those methods strive to find the tree that maximizes a likelihood score under a given stochastic model of sequence evolution. This work focuses on the inference of species as well as gene phylogenetic trees. Species evolve through speciation and extinction events. Genes evolve through events such as gene duplication, gene loss, and horizontal gene transfer. Both processes are strongly correlated, because genes belong to species and evolve within their genomes. One can deploy models of gene evolution and to exploit this correlation between species and gene evolutionary histories, in order to improve the accuracy of phylogenetic tree inference methods. However, the most widely used phylogenetic tree inference methods disregard these phenomena and focus on models of sequence evolution only. In addition, current maximum likelihood methods are computationally expensive. This is particularly challenging as the community faces a dramatically growing amount of available molecular data, due to recent advances in sequencing technologies. To handle this data avalanche, we urgently need tools that offer faster algorithms, as well as efficient parallel implementations. In this thesis, I develop new maximum likelihood methods, that explicitly model the relationships between species and gene histories, in order to infer more accurate phylogenetic trees. Those methods employ both, new heuristics, and dedicated parallelization schemes, in order to accelerate the inference process. My first project, ParGenes, is a parallel software pipeline for inferring gene family trees from a set of per-gene multiple sequence alignments. For each input alignment, it determines the best-fit model of sequence evolution, and subsequently searches for the gene family tree with the highest likelihood under this model. To this end, ParGenes uses several state-of-the-art tools, and runs them in parallel using a novel scheduling strategy. My second project, SpeciesRax, is a method for inferring a rooted species tree from a set of unrooted gene family trees. SpeciesRax strives to find the rooted species tree that maximizes the likelihood score under a dedicated model of gene evolution, that accounts for gene duplication, gene loss, and horizontal gene transfer. In addition, I introduce a new method for assessing the confidence in the resulting species tree, as well as a novel method for estimating its branch lengths. My third project, GeneRax, is a novel maximum likelihood method for gene family tree inference. GeneRax takes as input a rooted species tree as well as a set of (per-gene) multiple sequence alignments, and outputs one gene family tree per input alignment. To this end, I introduce the so-called joint likelihood function, which combines both, a model of sequence evolution, and a model of gene evolution. In addition, GeneRax can estimate the pattern of gene duplication, gene loss, and horizontal gene transfer events that occured along the input species tree.

Author: Benoit Morel

Date Published: No date defined

Publication Type: Doctoral Thesis

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH