Publications

What is a Publication?
41 Publications visible to you, out of a total of 41

Abstract (Expand)

Fine-tuning biomedical pre-trained language models (BioPLMs) such as BioBERT has become a common practice dominating leaderboards across various natural language processing tasks. Despite their success and wide adoption, prevailing fine-tuning approaches for named entity recognition (NER) naively train BioPLMs on targeted datasets without considering class distributions. This is problematic especially when dealing with imbalanced biomedical gold-standard datasets for NER in which most biomedical entities are underrepresented. In this paper, we address the class imbalance problem and propose WeLT, a cost-sensitive fine-tuning approach based on new re-scaled class weights for the task of biomedical NER. We evaluate WeLT’s fine-tuning performance on mixed-domain and domain-specific BioPLMs using eight biomedical gold-standard datasets. We compare our approach against vanilla fine-tuning and three other existing re-weighting schemes. Our results show the positive impact of handling the class imbalance problem. WeLT outperforms all the vanilla fine-tuned models. Furthermore, our method demonstrates advantages over other existing weighting schemes in most experiments.

Authors: Ghadeer Mobasher, Wolfgang Müller, Olga Krebs, Michael Gertz

Date Published: 2023

Publication Type: Proceedings

Abstract (Expand)

SABIO-RK represents a repository for structured, curated, and annotated data on reactions and their kinetics. The data are manually extracted from the scientific literature and stored in a relational database. The content comprises both naturally occurring and alternatively measured biochemical reactions, and the data are made available to the public via a web-based search interface as well as easy-to-use JSON web services for programmatic access. Data are highly interlinked to external databases, ontologies, and controlled vocabularies. This includes cross-references with eg Uniprot, ChEBI, KEGG, BRENDA, Biomodels, and MetaNetX. In the past year we have worked on improving findability of SABIO-RK data as well as interoperability: SABIO-RK was extended to read the additional annotations in the EnzymeML data exchange format to allow the direct import of enzymology data from EnzymeML documents. SABIO-RK is part of the EnzymeML workflow to support the data transfer between experimental platforms, modelling tools and databases (Range et al. FEBS J 2021). In the BMBF-funded project SABIO-VIS we focused on visualizing SABIORK data for the purpose of interactive search and search refinement.

Authors: Andreas Weidemann, Dorotea Dudas, Maja Rey, Ulrike Wittig, Wolfgang Müller

Date Published: 1st Aug 2022

Publication Type: InCollection

Abstract

Not specified

Authors: Sucheta Ghosh, Wolfgang Müller, Ulrike Wittig, Maja Rey

Date Published: 5th May 2022

Publication Type: InProceedings

Abstract (Expand)

BACKGROUND: Although decision-makers in health care settings need to read and understand the validity of quantitative reports, they do not always carefully read information on research methods. Presenting the methods in a more structured way could improve the time spent reading the methods and increase the perceived relevance of this important report section. OBJECTIVE: To test the effect of a structured summary of the methods used in a quantitative data report on reading behavior with eye-tracking and measure the effect on the perceived importance of this section. METHODS: A nonrandomized pilot trial was performed in a computer laboratory setting with advanced medical students. All participants were asked to read a quantitative data report; an intervention arm was also shown a textbox summarizing key features of the methods used in the report. Three data-collection methods were used to document reading behavior and the views of participants: eye-tracking (during reading), a written questionnaire, and a face-to-face interview. RESULTS: We included 35 participants, 22 in the control arm and 13 in the intervention arm. The overall time spent reading the methods did not differ between the 2 arms. The intervention arm considered the information in the methods section to be less helpful for decision-making than did the control arm (scores for perceived helpfulness were 4.1 and 2.9, respectively, range 1-10). Participants who read the box more intensively tended to spend more time on the methods as a whole (Pearson correlation 0.81, P=.001). CONCLUSIONS: Adding a structured summary of information on research methods attracted attention from most participants, but did not increase the time spent on reading the methods or lead to increased perceptions that the methods section was helpful for decision-making. Participants made use of the summary to quickly judge the methods, but this did not increase the perceived relevance of this section.

Authors: J. Koetsenruijter, P. Wronski, S. Ghosh, W. Muller, M. Wensing

Date Published: 12th Apr 2022

Publication Type: Journal

Abstract (Expand)

In this white paper, we describe the founding of a new ELIXIR Community - the Systems Biology Community - and its proposed future contributions to both ELIXIR and the broader community of systems biologists in Europe and worldwide. The Community believes that the infrastructure aspects of systems biology - databases, (modelling) tools and standards development, as well as training and access to cloud infrastructure - are not only appropriate components of the ELIXIR infrastructure, but will prove key components of ELIXIR’s future support of advanced biological applications and personalised medicine. By way of a series of meetings, the Community identified seven key areas for its future activities, reflecting both future needs and previous and current activities within ELIXIR Platforms and Communities. These are: overcoming barriers to the wider uptake of systems biology; linking new and existing data to systems biology models; interoperability of systems biology resources; further development and embedding of systems medicine; provisioning of modelling as a service; building and coordinating capacity building and training resources; and supporting industrial embedding of systems biology. A set of objectives for the Community has been identified under four main headline areas: Standardisation and Interoperability, Technology, Capacity Building and Training, and Industrial Embedding. These are grouped into short-term (3-year), mid-term (6-year) and long-term (10-year) objectives.

Authors: Vitor Martins dos Santos, Mihail Anton, Barbara Szomolay, Marek Ostaszewski, Ilja Arts, Rui Benfeitas, Victoria Dominguez Del Angel, Polonca Ferk, Dirk Fey, Carole Goble, Martin Golebiewski, Kristina Gruden, Katharina F. Heil, Henning Hermjakob, Pascal Kahlem, Maria I. Klapa, Jasper Koehorst, Alexey Kolodkin, Martina Kutmon, Brane Leskošek, Sébastien Moretti, Wolfgang Müller, Marco Pagni, Tadeja Rezen, Miguel Rocha, Damjana Rozman, David Šafránek, Rahuman S. Malik Sheriff, Maria Suarez Diez, Kristel Van Steen, Hans V Westerhoff, Ulrike Wittig, Katherine Wolstencroft, Anze Zupanic, Chris T. Evelo, John M. Hancock

Date Published: 2022

Publication Type: Journal

Abstract (Expand)

Background: Quantitative data reports are widely produced to inform health policy decisions. Policymakers are expected to critically assess provided information in order to incorporate the best available evidence into the decision-making process. Many other factors are known to influence this process, but little is known about how quantitative data reports are actually read. We explored the reading behavior of (future) health policy decision-makers, using innovative methods. Methods: We conducted a computer-assisted laboratory study, involving starting and advanced students in medicine and health sciences, and professionals as participants. They read a quantitative data report to inform a decision on the use of resources for long-term care in dementia in a hypothetical decision scenario. Data were collected through eye-tracking, questionnaires, and a brief interview. Eye-tracking data were used to generate ‘heatmaps’ and five measures of reading behavior. The questionnaires provided participants’ perceptions of understandability and helpfulness as well as individual characteristics. Interviews documented reasons for attention to specific report sections. The quantitative analysis was largely descriptive, complemented by Pearson correlations. Interviews were analyzed by qualitative content analysis. Results: In total, 46 individuals participated [students (85%), professionals (15%)]. Eye-tracking observations showed that the participants spent equal time and attention for most parts of the presented report, but were less focused when reading the methods section. The qualitative content analysis identified 29 reasons for attention to a report section related to four topics. Eye-tracking measures were largely unrelated to participants’ perceptions of understandability and helpfulness of the report. Conclusions: Eye-tracking data added information on reading behaviors that were not captured by questionnaires or interviews with health decision-makers.

Authors: Pamela Wronski, Michel Wensing, Sucheta Ghosh, Lukas Gärttner, Wolfgang Müller, Jan Koetsenruijter

Date Published: 1st Dec 2021

Publication Type: Journal

Abstract (Expand)

Chemical named entity recognition (NER) is a significant step for many downstream applications like entity linking for the chemical text-mining pipeline. However, the identification of chemical entities in a biomedical text is a challenging task due to the diverse morphology of chemical entities and the different types of chemical nomenclature. In this work, we describe our approach that was submitted for BioCreative version 7 challenge Track 2, focusing on the ‘Chemical Identification’ task for identifying chemical entities and entity linking, using MeSH. For this purpose, we have applied a two-stage approach as follows (a) usage of fine-tuned BioBERT for identification of chemical entities (b) semantic approximate search in MeSH and PubChem databases for entity linking. There was some friction between the two approaches, as our rule-based approach did not harmonise optimally with partially recognized words forwarded by the BERT component. For our future work, we aim to resolve the issue of the artefacts arising from BERT tokenizers and develop joint learning of chemical named entity recognition and entity linking using pre-trained transformer-based models and compare their performance with our preliminary approach. Next, we will improve the efficiency of our approximate search in reference databases during entity linking. This task is non-trivial as it entails determining similarity scores of large sets of trees with respect to a query tree. Ideally, this will enable flexible parametrization and rule selection for the entity linking search.

Authors: Ghadeer Mobasher, Lukrécia Mertová, Sucheta Ghosh, Olga Krebs, Bettina Heinlein, Wolfgang Müller

Date Published: 11th Nov 2021

Publication Type: Proceedings

Powered by
(v.1.16.0)
Copyright © 2008 - 2024 The University of Manchester and HITS gGmbH