Publications

What is a Publication?
87 Publications visible to you, out of a total of 87

Abstract (Expand)

Abstract Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees.ultiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets.

Authors: Julia Haag, Dimitri Höhler, Ben Bettisworth, Alexandros Stamatakis

Date Published: 1st Dec 2022

Publication Type: Journal

Abstract (Expand)

Abstract Summary The evaluation of phylogenetic inference tools is commonly conducted on simulated and empirical sequence data alignments. An open question is how representative these alignments aretion is how representative these alignments are with respect to those, commonly analyzed by users. Based upon the RAxMLGrove database, it is now possible to simulate DNA sequences based on more than 70, 000 representative RAxML and RAxML-NG tree inferences on empirical datasets conducted on the RAxML web servers. This allows to assess the phylogenetic tree inference accuracy of various inference tools based on realistic and representative simulated DNA alignments. We simulated 20, 000 MSAs based on representative datasets (in terms of signal strength) from RAxMLGrove, and used 5, 000 datasets from the TreeBASE database, to assess the inference accuracy of FastTree2, IQ-TREE2, and RAxML-NG. We find that on quantifiably difficult-to-analyze MSAs all of the analysed tools perform poorly, such that the quicker FastTree2, can constitute a viable alternative to infer trees. We also find, that there are substantial differences between accuracy results on simulated and empirical data, despite the fact that a substantial effort was undertaken to simulate sequences under as realistic as possible settings. Contact Dimitri Höhler, dimitri.hoehler@h-its.org

Authors: Dimitri Höhler, Julia Haag, Alexey M. Kozlov, Alexandros Stamatakis

Date Published: 1st Nov 2022

Publication Type: Journal

Abstract

Not specified

Authors: Lukas Hubner, Demian Hespe, Peter Sanders, Alexandros Stamatakis

Date Published: 1st Nov 2022

Publication Type: Journal

Abstract

Not specified

Authors: Julia Haag, Lukas Hübner, Alexey M. Kozlov, Alexandros Stamatakis

Date Published: 14th Jul 2022

Publication Type: Journal

Abstract

Not specified

Authors: Julia Haag, Dimitri Höhler, Ben Bettisworth, Alexandros Stamatakis

Date Published: 21st Jun 2022

Publication Type: Journal

Abstract

Not specified

Authors: Lucas Czech, Alexandros Stamatakis, Micah Dunthorn, Pierre Barbera

Date Published: 26th May 2022

Publication Type: Journal

Abstract

Not specified

Authors: Ben Bettisworth, Stephen A. Smith, Alexandros Stamatakis

Date Published: 20th Apr 2022

Publication Type: Journal

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH