Pandora: A Tool to Estimate Dimensionality Reduction Stability of Genotype Data

Abstract:

Motivation: Genotype datasets typically contain a large number of single nucleotide polymorphisms for a comparatively small number of individuals. To identify similarities between individuals and to infer an individual’s origin or membership to a cultural group, dimensionality reduction techniques are routinely deployed. However, inherent (technical) difficulties such as missing or noisy data need to be accounted for when analyzing a lower dimensional representation of genotype data, and the uncertainty of such an analysis should be reported in all studies. However, to date, there exists no stability estimation technique for genotype data that can estimate this uncertainty.

Results: Here, we present Pandora, a stability estimation framework for genotype data based on bootstrapping. Pandora computes an overall score to quantify the stability of the entire embedding, perindividual support values, and deploys a k-means clustering approach to assess the uncertainty of assignments to potential cultural groups. In addition to this bootstrap-based stability estimation, Pandora offers a sliding-window stability estimation for whole-genome data. Using published empirical and simulated datasets, we demonstrate the usage and utility of Pandora for studies that rely on dimensionality reduction techniques.

Data and Code: Availability Pandora is available on GitHub https://github.com/tschuelia/Pandora. All Python scripts and data to reproduce our results are available on GitHub https://github.com/tschuelia/PandoraPaper.

Citation: biorxiv;2024.03.14.584962v1,[Preprint]

Date Published: 15th Mar 2024

Registered Mode: by DOI

Citation
Haag, J., Jordan, A. I., & Stamatakis, A. (2024). Pandora: A Tool to Estimate Dimensionality Reduction Stability of Genotype Data. In []. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2024.03.14.584962
Activity

Views: 99

Created: 23rd Apr 2024 at 11:14

Last updated: 23rd Apr 2024 at 11:15

help Tags

This item has not yet been tagged.

help Attributions

None

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH