Publications

What is a Publication?
107 Publications visible to you, out of a total of 107

Abstract (Expand)

In addition to the ubiquitous big data, one key challenge indata processing and management in the life sciences is the diversity ofsmall data. Diverse pieces of small data have to be transformed intostandards-compliant data. Here, the challenge lies not in the difficulty ofsingle steps that need to be performed, but rather in the fact that manytransformation tasks are to be performed once or only a few times. Thislimits the time that can be put into automated approaches, which inturn severely limits the verifiability of such transformations.As much of the data to be processed is stored in spreadsheets, withinthis paper we justify and propose a lightweight recording-based solutionthat works on a wide variety of spreadsheet programs, from MicrosoftExcel to Google Docs.

Authors: Wolfgang Müller, Lukrecia Mertova

Date Published: 23rd Feb 2023

Publication Type: Journal

Abstract (Expand)

The design of biocatalytic reaction systems is highly complex owing to the dependency of the estimated kinetic parameters on the enzyme, the reaction conditions, and the modeling method. Consequently, reproducibility of enzymatic experiments and reusability of enzymatic data are challenging. We developed the XML-based markup language EnzymeML to enable storage and exchange of enzymatic data such as reaction conditions, the time course of the substrate and the product, kinetic parameters and the kinetic model, thus making enzymatic data findable, accessible, interoperable and reusable (FAIR). The feasibility and usefulness of the EnzymeML toolbox is demonstrated in six scenarios, for which data and metadata of different enzymatic reactions are collected and analyzed. EnzymeML serves as a seamless communication channel between experimental platforms, electronic lab notebooks, tools for modeling of enzyme kinetics, publication platforms and enzymatic reaction databases. EnzymeML is open and transparent, and invites the community to contribute. All documents and codes are freely available at https://enzymeml.org .

Authors: S. Lauterbach, H. Dienhart, J. Range, S. Malzacher, J. D. Sporing, D. Rother, M. F. Pinto, P. Martins, C. E. Lagerman, A. S. Bommarius, A. V. Host, J. M. Woodley, S. Ngubane, T. Kudanga, F. T. Bergmann, J. M. Rohwer, D. Iglezakis, A. Weidemann, U. Wittig, C. Kettner, N. Swainston, S. Schnell, J. Pleiss

Date Published: 9th Feb 2023

Publication Type: Journal

Abstract (Expand)

Abstract The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are oneerature. Chemicals are one of the most searched biomedical entities in PubMed, and—as highlighted during the coronavirus disease 2019 pandemic—their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text–mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/

Authors: Robert Leaman, Rezarta Islamaj, Virginia Adams, Mohammed A Alliheedi, João Rafael Almeida, Rui Antunes, Robert Bevan, Yung-Chun Chang, Arslan Erdengasileng, Matthew Hodgskiss, Ryuki Ida, Hyunjae Kim, Keqiao Li, Robert E Mercer, Lukrécia Mertová, Ghadeer Mobasher, Hoo-Chang Shin, Mujeen Sung, Tomoki Tsujimura, Wen-Chao Yeh, Zhiyong Lu

Date Published: 2023

Publication Type: Journal

Abstract (Expand)

Fine-tuning biomedical pre-trained language models (BioPLMs) such as BioBERT has become a common practice dominating leaderboards across various natural language processing tasks. Despite their success and wide adoption, prevailing fine-tuning approaches for named entity recognition (NER) naively train BioPLMs on targeted datasets without considering class distributions. This is problematic especially when dealing with imbalanced biomedical gold-standard datasets for NER in which most biomedical entities are underrepresented. In this paper, we address the class imbalance problem and propose WeLT, a cost-sensitive fine-tuning approach based on new re-scaled class weights for the task of biomedical NER. We evaluate WeLT’s fine-tuning performance on mixed-domain and domain-specific BioPLMs using eight biomedical gold-standard datasets. We compare our approach against vanilla fine-tuning and three other existing re-weighting schemes. Our results show the positive impact of handling the class imbalance problem. WeLT outperforms all the vanilla fine-tuned models. Furthermore, our method demonstrates advantages over other existing weighting schemes in most experiments.

Authors: Ghadeer Mobasher, Wolfgang Müller, Olga Krebs, Michael Gertz

Date Published: 2023

Publication Type: Proceedings

Abstract (Expand)

Abstract The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-hostribing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.

Authors: Anna Niarakis, Marek Ostaszewski, Alexander Mazein, Inna Kuperstein, Martina Kutmon, Marc E. Gillespie, Akira Funahashi, Marcio Luis Acencio, Ahmed Hemedan, Michael Aichem, Karsten Klein, Tobias Czauderna, Felicia Burtscher, Takahiro G. Yamada, Yusuke Hiki, Noriko F. Hiroi, Finterly Hu, Nhung Pham, Friederike Ehrhart, Egon L. Willighagen, Alberto Valdeolivas, Aurelien Dugourd, Francesco Messina, Marina Esteban-Medina, Maria Peña-Chilet, Kinza Rian, Sylvain Soliman, Sara Sadat Aghamiri, Bhanwar Lal Puniya, Aurélien Naldi, Tomáš Helikar, Vidisha Singh, Marco Fariñas Fernández, Viviam Bermudez, Eirini Tsirvouli, Arnau Montagud, Vincent Noël, Miguel Ponce de Leon, Dieter Maier, Angela Bauch, Benjamin M. Gyori, John A. Bachman, Augustin Luna, Janet Pinero, Laura I. Furlong, Irina Balaur, Adrien Rougny, Yohan Jarosz, Rupert W. Overall, Robert Phair, Livia Perfetto, Lisa Matthews, Devasahayam Arokia Balaya Rex, Marija Orlic-Milacic, Monraz Gomez Luis Cristobal, Bertrand De Meulder, Jean Marie Ravel, Bijay Jassal, Venkata Satagopam, Guanming Wu, Martin Golebiewski, Piotr Gawron, Laurence Calzone, Jacques S. Beckmann, Chris T. Evelo, Peter D’Eustachio, Falk Schreiber, Julio Saez-Rodriguez, Joaquin Dopazo, Martin Kuiper, Alfonso Valencia, Olaf Wolkenhauer, Hiroaki Kitano, Emmanuel Barillot, Charles Auffray, Rudi Balling, Reinhard Schneider

Date Published: 19th Dec 2022

Publication Type: Misc

Abstract (Expand)

The report focusses on national and EU-case studies (good practice examples) for integrating patient derived data, such as phenotype and large scale data, for in silico modelling in personalized medicine.Not specified

Authors: Martin Golebiewski, Marc Kirschner, Sylvia Krobitsch, EU-STANDS4PM consortium

Date Published: 15th Dec 2022

Publication Type: Tech report

Abstract (Expand)

This document specifies requirements for the consistent formatting and documentation of data and corresponding metadata (i.e. data describing the data and its context) in the life sciences, including biotechnology, and biomedical, as well as non-human biological research and development. It provides guidance on rendering data in the life sciences findable, accessible, interoperable and reusable (F-A-I-R). This document is applicable to manual or computational workflows that systematically capture, record or integrate data and corresponding metadata in the life sciences for other purposes. This document provides formatting requirements for both primary experimental or procedural data obtained manually and machine derived data. This document also describes requirements for storing, sharing, accessing, interoperability and reuse of data and corresponding metadata in the life sciences. This document specifies requirements for large quantities of data systematically obtained from automated high throughput workflows in the life sciences, as well as requirements for large-scale and small-scale data sets obtained by other life science technologies and manual data capture. This document is applicable to many domains in biotechnology and the life sciences including, but not limited to: basic/applied research in all domains of the life sciences, and industrial, medical, agricultural, or environmental biotechnology (excluding for diagnostic or therapeutic purposes), as well as methodology-driven domains, such as genomics (including massive parallel sequencing, metagenomics, epigenomics and functional genomics), transcriptomics, translatomics, proteomics, metabolomics, lipidomics, glycomics, enzymology, immunochemistry, synthetic biology, systems biology, systems medicine and related fields.

Author: Martin Golebiewski

Date Published: 4th Nov 2022

Publication Type: Manual

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH