Publications

What is a Publication?
127 Publications visible to you, out of a total of 127

Abstract (Expand)

ABSTRACT Motivation Genomes are a rich source of information on the pattern and process of evolution across biological scales. How best to make use of that information is an active area of research inat information is an active area of research in phylogenetics. Ideally, phylogenetic methods should not only model substitutions along gene trees, which explain differences between homologous gene sequences, but also the processes that generate the gene trees themselves along a shared species tree. To conduct accurate inferences, one needs to account for uncertainty at both levels, that is, in gene trees estimated from inherently short sequences and in their diverse evolutionary histories along a shared species tree. Results We present AleRax, a software that can infer reconciled gene trees together with a shared species tree using a simple, yet powerful, probabilistic model of gene duplication, transfer, and loss. A key feature of AleRax is its ability to account for uncertainty in the gene tree and its reconciliation by using an efficient approximation to calculate the joint phylogenetic-reconciliation likelihood and sample reconciled gene trees accordingly. Simulations and analyses of empirical data show that AleRax is one order of magnitude faster than competing gene tree inference tools while attaining the same accuracy. It is consistently more robust than species tree inference methods such as SpeciesRax and ASTRAL-Pro 2 under gene tree uncertainty. Finally, AleRax can process multiple gene families in parallel thereby allowing users to compare competing phylogenetic hypotheses and estimate model parameters, such as DTL probabilities for genome-scale datasets with hundreds of taxa Availability and Implementation GNU GPL at https://github.com/BenoitMorel/AleRax and data are made available at https://cme.h-its.org/exelixis/material/alerax_data.tar.gz . Contact Benoit.Morel@h-its.org Supplementary information Supplementary material is available.

Authors: Benoit Morel, Tom A. Williams, Alexandros Stamatakis, Gergely J. Szöllősi

Date Published: 7th Oct 2023

Publication Type: Journal

Abstract (Expand)

Abstract Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from differentt, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

Authors: Anastasis Togkousidis, Oleksiy M Kozlov, Julia Haag, Dimitri Höhler, Alexandros Stamatakis

Date Published: 1st Oct 2023

Publication Type: Journal

Abstract (Expand)

Abstract Taxonomic assignment of operational taxonomic units (OTUs) is an important bioinformatics step in analyzing environmental sequencing data. Pairwise alignment and phylogenetic‐placement methodsogenetic‐placement methods represent two alternative approaches to taxonomic assignments, but their results can differ. Here we used available colpodean ciliate OTUs from forest soils to compare the taxonomic assignments of VSEARCH (which performs pairwise alignments) and EPA‐ng (which performs phylogenetic placements). We showed that when there are differences in taxonomic assignments between pairwise alignments and phylogenetic placements at the subtaxon level, there is a low pairwise similarity of the OTUs to the reference database. We then showcase how the output of EPA‐ng can be further evaluated using GAPPA to assess the taxonomic assignments when there exist multiple equally likely placements of an OTU, by taking into account the sum over the likelihood weights of the OTU placements within a subtaxon, and the branch distances between equally likely placement locations. We also inferred the evolutionary and ecological characteristics of the colpodean OTUs using their placements within subtaxa. This study demonstrates how to fully analyze the output of EPA‐ng, by using GAPPA in conjunction with knowledge of the taxonomic diversity of the clade of interest.

Authors: Isabelle Ewers, Lubomír Rajter, Lucas Czech, Frédéric Mahé, Alexandros Stamatakis, Micah Dunthorn

Date Published: 1st Sep 2023

Publication Type: Journal

Abstract (Expand)

Abstract Motivation Simulating Multiple Sequence Alignments (MSAs) using probabilistic models of sequence evolution plays an important role in the evaluation of phylogenetic inference tools, and isluation of phylogenetic inference tools, and is crucial to the development of novel learning-based approaches for phylogenetic reconstruction, for instance, neural networks. These models and the resulting simulated data need to be as realistic as possible to be indicative of the performance of the developed tools on empirical data and to ensure that neural networks trained on simulations perform well on empirical data. Over the years, numerous models of evolution have been published with the goal to represent as faithfully as possible the sequence evolution process and thus simulate empirical-like data. In this study, we simulated DNA and protein MSAs under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how accurately supervised learning methods are able to predict whether a given MSA is simulated or empirical. Results Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate several aspects of empirical MSAs, including site-wise rates as well as amino acid and nucleotide composition. Data and Code Availability All simulated and empirical MSAs, as well as all analysis results, are available at https://cme.h-its.org/exelixis/material/simulation_study.tar.gz . All scripts required to reproduce our results are available at https://github.com/tschuelia/SimulationStudy and https://github.com/JohannaTrost/seqsharp . Contact julia.haag@h-its.org

Authors: Johanna Trost, Julia Haag, Dimitri Höhler, Laurent Jacob, Alexandros Stamatakis, Bastien Boussau

Date Published: 12th Jul 2023

Publication Type: Journal

Abstract

Not specified

Authors: Anastasis Togkousidis, Olga Chernomor, Alexandros Stamatakis

Date Published: 1st May 2023

Publication Type: Proceedings

Abstract (Expand)

Abstract Species tree-aware phylogenetic methods model how gene trees are generated along the species tree by a series of evolutionary events, including the duplication, transfer and loss of genes.fer and loss of genes. Over the past ten years these methods have emerged as a powerful tool for inferring and rooting gene and species trees, inferring ancestral gene repertoires, and studying the processes of gene and genome evolution. However, these methods are complex and can be more difficult to use than traditional phylogenetic approaches. Method development is rapid, and it can be difficult to decide between approaches and interpret results. Here, we review ALE and GeneRax, two popular packages for reconciling gene and species trees, explaining how they work, how results can be interpreted, and providing a tutorial for practical analysis. It was recently suggested that reconciliation-based estimates of duplication and transfer frequencies are unreliable. We evaluate this criticism and find that, provided parameters are estimated from the data rather than being fixed based on prior assumptions, reconciliation-based inferences are in good agreement with the literature, recovering variation in gene duplication and transfer frequencies across lineages consistent with the known biology of studied clades. For example, published datasets support the view that transfers greatly outnumber duplications in most prokaryotic lineages. We conclude by discussing some limitations of current models and prospects for future progress. Significance statement Evolutionary trees provide a framework for understanding the history of life and organising biodiversity. In this review, we discuss some recent progress on statistical methods that allow us to combine information from many different genes within the framework of an overarching phylogenetic species tree. We review the advantages and uses of these methods and discuss case studies where they have been used to resolve deep branches within the tree of life. We conclude with the limitations of current methods and suggest how they might be overcome in the future.

Authors: Tom A. Williams, Adrian A. Davin, Benoit Morel, Lénárd L. Szánthó, Anja Spang, Alexandros Stamatakis, Philip Hugenholtz, Gergely J. Szöllősi

Date Published: 17th Mar 2023

Publication Type: Journal

Abstract

Not specified

Authors: Dilek Koptekin, Eren Yüncü, Ricardo Rodríguez-Varela, N. Ezgi Altınışık, Nikolaos Psonis, Natalia Kashuba, Sevgi Yorulmaz, Robert George, Duygu Deniz Kazancı, Damla Kaptan, Kanat Gürün, Kıvılcım Başak Vural, Hasan Can Gemici, Despoina Vassou, Evangelia Daskalaki, Cansu Karamurat, Vendela K. Lagerholm, Ömür Dilek Erdal, Emrah Kırdök, Aurelio Marangoni, Andreas Schachner, Handan Üstündağ, Ramaz Shengelia, Liana Bitadze, Mikheil Elashvili, Eleni Stravopodi, Mihriban Özbaşaran, Güneş Duru, Argyro Nafplioti, C. Brian Rose, Tuğba Gencer, Gareth Darbyshire, Alexander Gavashelishvili, Konstantine Pitskhelauri, Özlem Çevik, Osman Vuruşkan, Nina Kyparissi-Apostolika, Ali Metin Büyükkarakaya, Umay Oğuzhanoğlu, Sevinç Günel, Eugenia Tabakaki, Akper Aliev, Anar Ibrahimov, Vaqif Shadlinski, Adamantios Sampson, Gülşah Merve Kılınç, Çiğdem Atakuman, Alexandros Stamatakis, Nikos Poulakakis, Yılmaz Selim Erdal, Pavlos Pavlidis, Jan Storå, Füsun Özer, Anders Götherström, Mehmet Somel

Date Published: 2023

Publication Type: Journal

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH