Publications

What is a Publication?
10 Publications visible to you, out of a total of 10

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing, but MPI only provides C and Fortran bindings. While this offers great language interoperability, high-level programming languages like C++ make software development quicker and less error-prone.We propose novel C++language bindings that cover all abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of parameters, by bringing named parameters to C++. This enables rapid prototyping and fine-tuning runtime behavior and memory management. A flexible type system and additional safety guarantees help to prevent programming errors.By exploiting C++’s template metaprogramming capabilities, this has (near) zero overhead, as only required code paths are generated at compile time.We demonstrate that our library is a strong foundation for a future distributed standard library using multiple application benchmarks, ranging from text-book sorting algorithms to phylogenetic interference.

Authors: Tim Niklas Uhl, Matthias Schimek, Lukas Hübner, Demian Hespe, Florian Kurpicz, Daniel Seemaier, Christoph Stelz, Peter Sanders

Date Published: 17th Nov 2024

Publication Type: Proceedings

Abstract (Expand)

In the field of population genetics, the driving forces of evolution within species can be studied with trees. Along a genome, each tree describes the local ancestries of a small genomic region. Together, those trees form a tree sequence that describes the ancestry of a population at every site of the sequence. Inferring tree sequences for whole genomes with many haplotype samples is a computationally expensive task, however. The state-of-the-art tool to infer tree sequences is tsinfer, which infers ancestries for human chromosomes from 5000 samples within a few hours. The tool has the capability to parallelize the computation, but we identify a structure in the input data that limits its parallelizability. We propose a novel parallelization scheme aiming to improve scaling at high thread counts, independently of this structure. Furthermore, we propose several optimizations for the inference algorithm, improving cache efficiency and reducing the number of operations per iteration. We provide a proof-of-concept implementation, and compare the computation speed of our implementation and tsinfer. When inferring ancestries for the 1000 Genomes Project, our implementation is consistently faster by a factor of 1.9 to 2.4. Additionally, depending on the choice of parameters, our parallelization scheme scales better between 32 and 96 cores, improving its speed advantage, especially at higher core counts. In phases where our novel parallelization scheme does not apply, our optimizations still improve the runtime by a factor of 2.2. As available genomic data sets are growing rapidly in size, our contribution decreases the computation time and enables better parallelization, allowing the processing of larger data sets in reasonable time frames

Authors: Johannes Hengstler, Lukas Hübner, Alexandros Stamatakis

Date Published: 1st Aug 2024

Publication Type: Journal

Abstract (Expand)

The Message-Passing Interface (MPI) and C++ form the backbone of high-performance computing and algorithmic research in the field of distributed-memory computing, but MPI only provides C and Fortran bindings. This provides good language interoperability, but higher-level programming languages make development quicker and less error-prone. We propose novel C++ language bindings designed to cover the whole range of abstraction levels from low-level MPI calls to convenient STL-style bindings, where most parameters are inferred from a small subset of the full parameter set. This allows for both rapid prototyping and fine-tuning of distributed code with predictable runtime behavior and memory management. Using template-metaprogramming, only code paths required for computing missing parameters are generated at compile time, which results in (near) zero-overhead bindings.

Authors: Kunal Agrawal, Erez Petrank, Demian Hespe, Lukas Hübner, Florian Kurpicz, Peter Sanders, Matthias Schimek, Daniel Seemaier, Tim Niklas Uhl

Date Published: 17th Jun 2024

Publication Type: Proceedings

Abstract (Expand)

In the analysis of DNA sequencing data for finding disease causing mutations, to understand evolutionary relationships between species, and to find variants, DNA-Reads are compared to a reference genome. A reference genome is a representative example for a set of genes of a species. Sorting these aligned DNA-Reads by their position within the reference sequence is a crucial step in many of these downstream analyses. SAMtools sort, a widely used tool, performs external memory sorting of aligned DNA-Reads stored in the BAM format (Binary Alignment Map). This format allows for compressed storage of alignment data. SAMtools sort provides the most comprehensive set of features while exhibiting demonstrably faster execution times than its open source alternatives. In this work, we analyze SAMtools sort for sorting BAM files and propose methods to reduce its runtime. We divide the analysis into three parts: management of temporary files, compression, and input/output (IO). For the management of temporary files, we find that the maximum number of temporary files SAMtools sort can open concurrently is lower than the maximum number of open files permitted by the operating system. This results in an unnecessarily high number of merges of temporary files into larger temporary files, introducing overhead as SAMtools sort performs extra write and compression operations. To overcome this, we propose a dynamic limit for the number of temporary files, adapting to the operating system’s soft limit for open files. For compression, we test seven different libraries for compatible compression and a range of compression levels, identifying options that offer faster compression and result in a speedup of up to five times in single-threaded execution of SAMtools sort. For IO, we demonstrate that a minimal level of compression avoids IO overhead, thereby reducing the runtime of SAMtools sort compared to uncompressed output. However, we also show that uncompressed output can be used in the pipelining of SAMtools commands to reduce the runtime of subsequent SAMtools commands. Our proposed modifications to SAMtools sort and user behavior have the potential to achieve speedups of up to 6. This represents an important contribution to the field of bioinformatics, considering the widespread adoption of SAMtools sort evidenced by its over 5,000 citations and over 5.1 million downloads through Bioconda.

Authors: Dominik Siebelt, Lukas Hübner, Alexandros Stamatakis

Date Published: 3rd Jun 2024

Publication Type: Bachelor's Thesis

Abstract (Expand)

The field of population genetics attempts to advance our understanding of evolutionary processes. It has applications, for example, in medical research, wildlife conservation, and – in conjunction with recent advances in ancient DNA sequencing technology – studying human migration patterns over the past few thousand years. The basic toolbox of population genetics includes genealogical tress, which describe the shared evolutionary history among individuals of the same species. They are calculated on the basis of genetic variations. However, in recombining organisms, a single tree is insufficient to describe the evolutionary history of the whole genome. Instead, a collection of correlated trees can be used, where each describes the evolutionary history of a consecutive region of the genome. The current corresponding state of-the-art data structure, tree sequences, compresses these genealogical trees via edit operations when moving from one tree to the next along the genome instead of storing the full, often redundant, description for each tree. We propose a new data structure, genealogical forests, which compresses the set of genealogical trees into a DAG. In this DAG identical subtrees that are shared across the input trees are encoded only once, thereby allowing for straight-forward memoization of intermediate results. Additionally, we provide a C++ implementation of our proposed data structure, called gfkit , which is 2.1 to 11.2 (median 4.0) times faster than the state-of-the-art tool on empirical and simulated datasets at computing important population genetics statistics such as the Allele Frequency Spectrum, Patterson’s f , the Fixation Index, Tajima’s D , pairwise Lowest Common Ancestors, and others. On Lowest Common Ancestor queries with more than two samples as input, gfkit scales asymptotically better than the state-of-the-art, and is thus up to 990 times faster. In conclusion, our proposed data structure compresses genealogical trees by storing shared subtrees only once, thereby enabling straight-forward memoization of intermediate results, yielding a substantial runtime reduction and a potentially more intuitive data representation over the state-of-the-art. Our improvements will boost the development of novel analyses and models in the field of population genetics and increases scalability to ever-growing genomic datasets. 2012 ACM Subject Classification Applied computing → Computational genomics; Applied computing → Molecular sequence analysis; Applied computing → Bioinformatics; Applied computing → Population genetics

Authors: Lukas Hübner, Alexandros Stamatakis

Date Published: 27th May 2024

Publication Type: Proceedings

Abstract (Expand)

Abstract Summary Maximum likelihood (ML) is a widely used phylogenetic inference method. ML implementations heavily rely on numerical optimization routines that use internal numerical thresholds totion routines that use internal numerical thresholds to determine convergence. We systematically analyze the impact of these threshold settings on the log-likelihood and runtimes for ML tree inferences with RAxML-NG, IQ-TREE, and FastTree on empirical datasets. We provide empirical evidence that we can substantially accelerate tree inferences with RAxML-NG and IQ-TREE by changing the default values of two such numerical thresholds. At the same time, altering these settings does not significantly impact the quality of the inferred trees. We further show that increasing both thresholds accelerates the RAxML-NG bootstrap without influencing the resulting support values. For RAxML-NG, increasing the likelihood thresholds ϵLnL and ϵbrlen to 10 and 103, respectively, results in an average tree inference speedup of 1.9 ± 0.6 on Data collection 1, 1.8 ± 1.1 on Data collection 2, and 1.9 ± 0.8 on Data collection 2 for the RAxML-NG bootstrap compared to the runtime under the current default setting. Increasing the likelihood threshold ϵLnL to 10 in IQ-TREE results in an average tree inference speedup of 1.3 ± 0.4 on Data collection 1 and 1.3 ± 0.9 on Data collection 2. Availability and implementation All MSAs we used for our analyses, as well as all results, are available for download at https://cme.h-its.org/exelixis/material/freeLunch_data.tar.gz. Our data generation scripts are available at https://github.com/tschuelia/ml-numerical-analysis.

Authors: Julia Haag, Lukas Hübner, Alexey M Kozlov, Alexandros Stamatakis

Date Published: 2023

Publication Type: Journal

Abstract

Not specified

Authors: Lukas Hubner, Demian Hespe, Peter Sanders, Alexandros Stamatakis

Date Published: 1st Nov 2022

Publication Type: Journal

Powered by
(v.1.15.2)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH