Publications

What is a Publication?
1701 Publications visible to you, out of a total of 1701

Abstract

Not specified

Authors: Leif Seute, Eric Hartmann, Jan Stühmer, Frauke Gräter

Date Published: 25th Mar 2024

Publication Type: InProceedings

Abstract (Expand)

In traditional studies on language evolution, scholars often emphasize the importance of sound laws and sound correspondences for phylogenetic inference of language family trees. However, to date, computational approaches have typically not taken this potential into account. Most computational studies still rely on lexical cognates as major data source for phylogenetic reconstruction in linguistics, although there do exist a few studies in which authors praise the benefits of comparing words at the level of sound sequences. Building on (a) ten diverse datasets from different language families, and (b) state-of-the-art methods for automated cognate and sound correspondence detection, we test, for the first time, the performance of sound-based versus cognate-based approaches to phylogenetic reconstruction. Our results show that phylogenies reconstructed from lexical cognates are topologically closer, by approximately one third with respect to the generalized quartet distance on average, to the gold standard phylogenies than phylogenies reconstructed from sound correspondences.

Authors: Luise Häuser, Gerhard Jäger, Johann-Mattis List, Taraka Rama, Alexandros Stamatakis

Date Published: 22nd Mar 2024

Publication Type: Proceedings

Abstract

Not specified

Editor:

Date Published: 18th Mar 2024

Publication Type: Master's Thesis

Abstract

Not specified

Authors: Xianghe Ma, Michael Strube, Wei Zhao

Date Published: 17th Mar 2024

Publication Type: InProceedings

Abstract (Expand)

Motivation: Genotype datasets typically contain a large number of single nucleotide polymorphisms for a comparatively small number of individuals. To identify similarities between individuals and to infer an individual’s origin or membership to a cultural group, dimensionality reduction techniques are routinely deployed. However, inherent (technical) difficulties such as missing or noisy data need to be accounted for when analyzing a lower dimensional representation of genotype data, and the uncertainty of such an analysis should be reported in all studies. However, to date, there exists no stability estimation technique for genotype data that can estimate this uncertainty. Results: Here, we present Pandora, a stability estimation framework for genotype data based on bootstrapping. Pandora computes an overall score to quantify the stability of the entire embedding, perindividual support values, and deploys a k-means clustering approach to assess the uncertainty of assignments to potential cultural groups. In addition to this bootstrap-based stability estimation, Pandora offers a sliding-window stability estimation for whole-genome data. Using published empirical and simulated datasets, we demonstrate the usage and utility of Pandora for studies that rely on dimensionality reduction techniques. Data and Code: Availability Pandora is available on GitHub https://github.com/tschuelia/Pandora. All Python scripts and data to reproduce our results are available on GitHub https://github.com/tschuelia/PandoraPaper.

Authors: Julia Haag, Alexander I. Jordan, Alexandros Stamatakis

Date Published: 15th Mar 2024

Publication Type: Journal

Abstract

Not specified

Authors: Jack Fosten, Daniel Gutknecht, Marc-Oliver Pohle

Date Published: 12th Mar 2024

Publication Type: Journal

Abstract (Expand)

Estimating the statistical robustness of the inferred tree(s) constitutes an integral part of most phylogenetic analyses. Commonly, one computes and assigns a branch support value to each inner branch of the inferred phylogeny. The most widely used method for calculating branch support on trees inferred under Maximum Likelihood (ML) is the Standard, non-parametric Felsenstein Bootstrap Support (SBS). Due to the high computational cost of the SBS, a plethora of methods has been developed to approximate it, for instance, via the Rapid Bootstrap (RB) algorithm. There have also been attempts to devise faster, alternative support measures, such as the SH-aLRT (Shimodaira–Hasegawalike approximate Likelihood Ratio Test) or the UltraFast Bootstrap 2 (UFBoot2) method. Those faster alternatives exhibit some limitations, such as the need to assess model violations (UFBoot2) or meaningless low branch support intervals (SH-aLRT). Here, we present the Educated Bootstrap Guesser (EBG), a machine learning-based tool that predicts SBS branch support values for a given input phylogeny. EBG is on average 9.4 (σ = 5.5) times faster than UFBoot2. EBG-based SBS estimates exhibit a median absolute error of 5 when predicting SBS values between 0 and 100. Furthermore, EBG also provides uncertainty measures for all per-branch SBS predictions and thereby allows for a more rigorous and careful interpretation. EBG can predict SBS support values on a phylogeny comprising 1654 SARS-CoV2 genome sequences within 3 hours on a mid-class laptop. EBG is available under GNU GPL3.

Authors: Julius Wiegert, Dimitri Höhler, Julia Haag, Alexandros Stamatakis

Date Published: 6th Mar 2024

Publication Type: Journal

Powered by
(v.1.16.0)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH