Quantitative Analysis and Characterization of Natural Language Evolution Datasets

Abstract:

Methods for phylogenetic inference have been developed mainly for the reconstruction of evolutionary relationships of species based on biological sequence data. However, these methods are also made use of in linguistics for inferring phylogenies concerning the evolution of natural languages. In the scope of this thesis, we examine the corresponding linguistic input data. We conduct a case study on an exemplary morphosyntactic data set, examining various methods to analyze the signal it contains and to eliminate geographical information the data may include. Further, we perform analyses on numerous linguistic data sets collected from various sources and assembled in a database. We compare these data sets to morphological data from biology, considering differences in the behavior of phylogenetic inferences with RAxML-NG. Additionally, we investigate how it impacts the tree inferences, whether we represent a data set by a binary or by a multi-valued MSA. We study how to model subjectivity related with synonym selection in cognate data. We present probabilistic MSAs as a possible solution and show on an example data set that this might be an appropriate approach

SEEK ID: https://publications.h-its.org/publications/1916

Filename: luise.pdf 

Format: PDF document

Size: 1.49 MB

SEEK ID: https://publications.h-its.org/publications/1916

Research Groups: Computational Molecular Evolution

Publication type: Master's Thesis

Citation:

Date Published: 17th Jun 2023

URL:

Registered Mode: manually

help Submitter
Activity

Views: 104   Downloads: 1

Created: 9th Jan 2025 at 13:19

Last updated: 9th Jan 2025 at 13:19

help Tags

This item has not yet been tagged.

help Attributions

None

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH