Publications

What is a Publication?
11 Publications visible to you, out of a total of 11

Abstract (Expand)

Abstract Summary The evaluation of phylogenetic inference tools is commonly conducted on simulated and empirical sequence data alignments. An open question is how representative these alignments aretion is how representative these alignments are with respect to those, commonly analyzed by users. Based upon the RAxMLGrove database, it is now possible to simulate DNA sequences based on more than 70, 000 representative RAxML and RAxML-NG tree inferences on empirical datasets conducted on the RAxML web servers. This allows to assess the phylogenetic tree inference accuracy of various inference tools based on realistic and representative simulated DNA alignments. We simulated 20, 000 MSAs based on representative datasets (in terms of signal strength) from RAxMLGrove, and used 5, 000 datasets from the TreeBASE database, to assess the inference accuracy of FastTree2, IQ-TREE2, and RAxML-NG. We find that on quantifiably difficult-to-analyze MSAs all of the analysed tools perform poorly, such that the quicker FastTree2, can constitute a viable alternative to infer trees. We also find, that there are substantial differences between accuracy results on simulated and empirical data, despite the fact that a substantial effort was undertaken to simulate sequences under as realistic as possible settings. Contact Dimitri Höhler, dimitri.hoehler@h-its.org

Authors: Dimitri Höhler, Julia Haag, Alexey M. Kozlov, Alexandros Stamatakis

Date Published: 1st Nov 2022

Publication Type: Journal

Abstract

Not specified

Authors: Julia Haag, Lukas Hübner, Alexey M. Kozlov, Alexandros Stamatakis

Date Published: 14th Jul 2022

Publication Type: Journal

Abstract

Not specified

Authors: Julia Haag, Dimitri Höhler, Ben Bettisworth, Alexandros Stamatakis

Date Published: 21st Jun 2022

Publication Type: Journal

Abstract (Expand)

Phylogenetic trees represent hypothetical evolutionary relationships between organisms. Approaches for inferring phylogenetic trees include the Maximum Likelihood (ML) method. This method relies on numerical optimization routines that use internal numerical thresholds. We analyze the influence of these thresholds on the likelihood scores and runtimes of tree inferences for the ML inference tools RAxML-NG, IQ-Tree, and FastTree. We analyze 22 empirical datasets and show that we can speed up the tree inference in RAxML-NG and IQ-Tree by changing the default values of two such numerical thresholds. Using 15 additional simulated datasets, we show that these changes do not affect the accuracy of the inferred phylogenetic trees. For RAxML-NG, increasing the likelihood thresholds lh_epsilon and spr_lh_epsilon to 10 and 103 respectively results in an average speedup of 1.9 ± 0.6. Increasing the likelihood threshold lh_epsilon in IQ-Tree results in an average speedup of 1.3 ± 0.4. In addition to the numerical analysis, we attempt to predict the difficulty of datasets, with the aim of preventing an unnecessarily large number of tree inferences for datasets that are easy to analyze. We present our prediction experiments and discuss why this task proved to be more challenging than anticipated.

Author: Julia Haag

Date Published: No date defined

Publication Type: Master's Thesis

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH