Re-Engineering Genomic Tree Sequence Inference Algorithms

Abstract:

In the field of population genetics, the driving forces of evolution within species can be studied with trees. Along a genome, each tree describes the local ancestries of a small genomic region. Together, those trees form a tree sequence that describes the ancestry of a population at every site of the sequence. Inferring tree sequences for whole genomes with many haplotype samples is a computationally expensive task, however. The state-of-the-art tool to infer tree sequences is tsinfer, which infers ancestries for human chromosomes from 5000 samples within a few hours. The tool has the capability to parallelize the computation, but we identify a structure in the input data that limits its parallelizability. We propose a novel parallelization scheme aiming to improve scaling at high thread counts, independently of this structure. Furthermore, we propose several optimizations for the inference algorithm, improving cache efficiency and reducing the number of operations per iteration. We provide a proof-of-concept implementation, and compare the computation speed of our implementation and tsinfer. When inferring ancestries for the 1000 Genomes Project, our implementation is consistently faster by a factor of 1.9 to 2.4. Additionally, depending on the choice of parameters, our parallelization scheme scales better between 32 and 96 cores, improving its speed advantage, especially at higher core counts. In phases where our novel parallelization scheme does not apply, our optimizations still improve the runtime by a factor of 2.2. As available genomic data sets are growing rapidly in size, our contribution decreases the computation time and enables better parallelization, allowing the processing of larger data sets in reasonable time frames

SEEK ID: https://publications.h-its.org/publications/1910

Filename: JohannesHengstler.pdf 

Format: PDF document

Size: 1.23 MB

SEEK ID: https://publications.h-its.org/publications/1910

Research Groups: Computational Molecular Evolution

Publication type: Journal

Citation:

Date Published: 1st Aug 2024

URL:

Registered Mode: manually

Authors: Johannes Hengstler, Lukas Hübner, Alexandros Stamatakis

help Submitter
Activity

Views: 100   Downloads: 1

Created: 9th Jan 2025 at 13:02

help Tags

This item has not yet been tagged.

help Attributions

None

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH