Publications

What is a Publication?
104 Publications visible to you, out of a total of 104

Abstract (Expand)

Fine-tuning biomedical pre-trained language models (BioPLMs) such as BioBERT has become a common practice dominating leaderboards across various natural language processing tasks. Despite their success and wide adoption, prevailing fine-tuning approaches for named entity recognition (NER) naively train BioPLMs on targeted datasets without considering class distributions. This is problematic especially when dealing with imbalanced biomedical gold-standard datasets for NER in which most biomedical entities are underrepresented. In this paper, we address the class imbalance problem and propose WeLT, a cost-sensitive fine-tuning approach based on new re-scaled class weights for the task of biomedical NER. We evaluate WeLT’s fine-tuning performance on mixed-domain and domain-specific BioPLMs using eight biomedical gold-standard datasets. We compare our approach against vanilla fine-tuning and three other existing re-weighting schemes. Our results show the positive impact of handling the class imbalance problem. WeLT outperforms all the vanilla fine-tuned models. Furthermore, our method demonstrates advantages over other existing weighting schemes in most experiments.

Authors: Ghadeer Mobasher, Wolfgang Müller, Olga Krebs, Michael Gertz

Date Published: 2023

Publication Type: Proceedings

Abstract (Expand)

Abstract The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-hostribing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Community-driven and highly interdisciplinary, the project is collaborative and supports community standards, open access, and the FAIR data principles. The coordination of community work allowed for an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework links key molecules highlighted from broad omics data analysis and computational modeling to dysregulated pathways in a cell-, tissue- or patient-specific manner. We also employ text mining and AI-assisted analysis to identify potential drugs and drug targets and use topological analysis to reveal interesting structural features of the map. The proposed framework is versatile and expandable, offering a significant upgrade in the arsenal used to understand virus-host interactions and other complex pathologies.

Authors: Anna Niarakis, Marek Ostaszewski, Alexander Mazein, Inna Kuperstein, Martina Kutmon, Marc E. Gillespie, Akira Funahashi, Marcio Luis Acencio, Ahmed Hemedan, Michael Aichem, Karsten Klein, Tobias Czauderna, Felicia Burtscher, Takahiro G. Yamada, Yusuke Hiki, Noriko F. Hiroi, Finterly Hu, Nhung Pham, Friederike Ehrhart, Egon L. Willighagen, Alberto Valdeolivas, Aurelien Dugourd, Francesco Messina, Marina Esteban-Medina, Maria Peña-Chilet, Kinza Rian, Sylvain Soliman, Sara Sadat Aghamiri, Bhanwar Lal Puniya, Aurélien Naldi, Tomáš Helikar, Vidisha Singh, Marco Fariñas Fernández, Viviam Bermudez, Eirini Tsirvouli, Arnau Montagud, Vincent Noël, Miguel Ponce de Leon, Dieter Maier, Angela Bauch, Benjamin M. Gyori, John A. Bachman, Augustin Luna, Janet Pinero, Laura I. Furlong, Irina Balaur, Adrien Rougny, Yohan Jarosz, Rupert W. Overall, Robert Phair, Livia Perfetto, Lisa Matthews, Devasahayam Arokia Balaya Rex, Marija Orlic-Milacic, Monraz Gomez Luis Cristobal, Bertrand De Meulder, Jean Marie Ravel, Bijay Jassal, Venkata Satagopam, Guanming Wu, Martin Golebiewski, Piotr Gawron, Laurence Calzone, Jacques S. Beckmann, Chris T. Evelo, Peter D’Eustachio, Falk Schreiber, Julio Saez-Rodriguez, Joaquin Dopazo, Martin Kuiper, Alfonso Valencia, Olaf Wolkenhauer, Hiroaki Kitano, Emmanuel Barillot, Charles Auffray, Rudi Balling, Reinhard Schneider

Date Published: 19th Dec 2022

Publication Type: Misc

Abstract (Expand)

The report focusses on national and EU-case studies (good practice examples) for integrating patient derived data, such as phenotype and large scale data, for in silico modelling in personalized medicine.Not specified

Authors: Martin Golebiewski, Marc Kirschner, Sylvia Krobitsch, EU-STANDS4PM consortium

Date Published: 15th Dec 2022

Publication Type: Tech report

Abstract (Expand)

This document specifies requirements for the consistent formatting and documentation of data and corresponding metadata (i.e. data describing the data and its context) in the life sciences, including biotechnology, and biomedical, as well as non-human biological research and development. It provides guidance on rendering data in the life sciences findable, accessible, interoperable and reusable (F-A-I-R). This document is applicable to manual or computational workflows that systematically capture, record or integrate data and corresponding metadata in the life sciences for other purposes. This document provides formatting requirements for both primary experimental or procedural data obtained manually and machine derived data. This document also describes requirements for storing, sharing, accessing, interoperability and reuse of data and corresponding metadata in the life sciences. This document specifies requirements for large quantities of data systematically obtained from automated high throughput workflows in the life sciences, as well as requirements for large-scale and small-scale data sets obtained by other life science technologies and manual data capture. This document is applicable to many domains in biotechnology and the life sciences including, but not limited to: basic/applied research in all domains of the life sciences, and industrial, medical, agricultural, or environmental biotechnology (excluding for diagnostic or therapeutic purposes), as well as methodology-driven domains, such as genomics (including massive parallel sequencing, metagenomics, epigenomics and functional genomics), transcriptomics, translatomics, proteomics, metabolomics, lipidomics, glycomics, enzymology, immunochemistry, synthetic biology, systems biology, systems medicine and related fields.

Author: Martin Golebiewski

Date Published: 4th Nov 2022

Publication Type: Manual

Abstract (Expand)

SABIO-RK represents a repository for structured, curated, and annotated data on reactions and their kinetics. The data are manually extracted from the scientific literature and stored in a relational database. The content comprises both naturally occurring and alternatively measured biochemical reactions, and the data are made available to the public via a web-based search interface as well as easy-to-use JSON web services for programmatic access. Data are highly interlinked to external databases, ontologies, and controlled vocabularies. This includes cross-references with eg Uniprot, ChEBI, KEGG, BRENDA, Biomodels, and MetaNetX. In the past year we have worked on improving findability of SABIO-RK data as well as interoperability: SABIO-RK was extended to read the additional annotations in the EnzymeML data exchange format to allow the direct import of enzymology data from EnzymeML documents. SABIO-RK is part of the EnzymeML workflow to support the data transfer between experimental platforms, modelling tools and databases (Range et al. FEBS J 2021). In the BMBF-funded project SABIO-VIS we focused on visualizing SABIORK data for the purpose of interactive search and search refinement.

Authors: Andreas Weidemann, Dorotea Dudas, Maja Rey, Ulrike Wittig, Wolfgang Müller

Date Published: 1st Aug 2022

Publication Type: InCollection

Abstract

Not specified

Authors: Sucheta Ghosh, Wolfgang Müller, Ulrike Wittig, Maja Rey

Date Published: 5th May 2022

Publication Type: InProceedings

Abstract (Expand)

BACKGROUND: Although decision-makers in health care settings need to read and understand the validity of quantitative reports, they do not always carefully read information on research methods. Presenting the methods in a more structured way could improve the time spent reading the methods and increase the perceived relevance of this important report section. OBJECTIVE: To test the effect of a structured summary of the methods used in a quantitative data report on reading behavior with eye-tracking and measure the effect on the perceived importance of this section. METHODS: A nonrandomized pilot trial was performed in a computer laboratory setting with advanced medical students. All participants were asked to read a quantitative data report; an intervention arm was also shown a textbox summarizing key features of the methods used in the report. Three data-collection methods were used to document reading behavior and the views of participants: eye-tracking (during reading), a written questionnaire, and a face-to-face interview. RESULTS: We included 35 participants, 22 in the control arm and 13 in the intervention arm. The overall time spent reading the methods did not differ between the 2 arms. The intervention arm considered the information in the methods section to be less helpful for decision-making than did the control arm (scores for perceived helpfulness were 4.1 and 2.9, respectively, range 1-10). Participants who read the box more intensively tended to spend more time on the methods as a whole (Pearson correlation 0.81, P=.001). CONCLUSIONS: Adding a structured summary of information on research methods attracted attention from most participants, but did not increase the time spent on reading the methods or lead to increased perceptions that the methods section was helpful for decision-making. Participants made use of the summary to quickly judge the methods, but this did not increase the perceived relevance of this section.

Authors: J. Koetsenruijter, P. Wronski, S. Ghosh, W. Muller, M. Wensing

Date Published: 12th Apr 2022

Publication Type: Journal

Powered by
(v.1.14.2)
Copyright © 2008 - 2023 The University of Manchester and HITS gGmbH