Now showing 1 - 10 of 21
  • Publication
    On the effects of biased quantum random numbers on the initialization of artificial neural networks
    Recent advances in practical quantum computing have led to a variety of cloud-based quantum computing platforms that allow researchers to evaluate their algorithms on noisy intermediate-scale quantum devices. A common property of quantum computers is that they can exhibit instances of true randomness as opposed to pseudo-randomness obtained from classical systems. Investigating the effects of such true quantum randomness in the context of machine learning is appealing, and recent results vaguely suggest that benefits can indeed be achieved from the use of quantum random numbers. To shed some more light on this topic, we empirically study the effects of hardware-biased quantum random numbers on the initialization of artificial neural network weights in numerical experiments. We find no statistically significant difference in comparison with unbiased quantum random numbers as well as biased and unbiased random numbers from a classical pseudo-random number generator. The quantum random numbers for our experiments are obtained from real quantum hardware.
  • Publication
    How Does Knowledge Injection Help in Informed Machine Learning?
    Informed machine learning describes the injection of prior knowledge into learning systems. It can help to improve generalization, especially when training data is scarce. However, the field is so application-driven that general analyses about the effect of knowledge injection are rare. This makes it difficult to transfer existing approaches to new applications, or to estimate potential improvements. Therefore, in this paper, we present a framework for quantifying the value of prior knowledge in informed machine learning. Our main contributions are threefold. Firstly, we propose a set of relevant metrics for quantifying the benefits of knowledge injection, comprising in-distribution accuracy, out-of-distribution robustness, and knowledge conformity. We also introduce a metric that combines performance improvement and data reduction. Secondly, we present a theoretical framework that represents prior knowledge in a function space and relates it to data representations and a trained model. This suggests that the distances between knowledge and data influence potential model improvements. Thirdly, we perform a systematic experimental study with controllable toy problems. All in all, this helps to find general answers to the question how knowledge injection helps in informed machine learning.
  • Publication
    A machine learning method for the identification and characterization of novel COVID-19 drug targets
    ( 2023-05-03) ;
    Delong, Lauren Nicole
    ;
    Masny, Aliaksandr
    ;
    Lentzen, Manuel
    ;
    ;
    Dijk, David van
    ;
    ;
    Hansen, Anne Funck
    ;
    ; ; ; ; ; ;
    Kannt, Aimo
    ;
    Foldenauer, Ann Christina
    ;
    ;
    Resch, Eduard
    ;
    Frank, Kevin
    ;
    ; ; ;
    Laue, Hendrik
    ;
    ;
    Hirsch, Jochen
    ;
    Wischnewski, Marco
    ;
    ; ;
    Tom Kodamullil, Alpha
    ;
    Gemünd, Andre
    ;
    Fluck, Juliane
    ;
    Steinborn, Carina
    ;
    ; ;
    Hermanowski, Helena
    ;
    ;
    Klein, Jürgen
    ;
    ; ; ;
    Knieps, Meike
    ;
    ;
    Wendland, Philipp Johannes
    ;
    Wegner, Philipp
    ;
    ; ; ;
    Lentzen, Manuel
    ;
    In addition to vaccines, the World Health Organization sees novel medications as an urgent matter to fight the ongoing COVID-19 pandemic. One possible strategy is to identify target proteins, for which a perturbation by an existing compound is likely to benefit COVID-19 patients. In order to contribute to this effort, we present GuiltyTargets-COVID-19 (https://guiltytargets-covid.eu/), a machine learning supported web tool to identify novel candidate drug targets. Using six bulk and three single cell RNA-Seq datasets, together with a lung tissue specific protein-protein interaction network, we demonstrate that GuiltyTargets-COVID-19 is capable of (i) prioritizing meaningful target candidates and assessing their druggability, (ii) unraveling their linkage to known disease mechanisms, (iii) mapping ligands from the ChEMBL database to the identified targets, and (iv) pointing out potential side effects in the case that the mapped ligands correspond to approved drugs. Our example analyses identified 4 potential drug targets from the datasets: AKT3 from both the bulk and single cell RNA-Seq data as well as AKT2, MLKL, and MAPK11 in the single cell experiments. Altogether, we believe that our web tool will facilitate future target identification and drug development for COVID-19, notably in a cell type and tissue specific manner.
  • Publication
    Feature selection on quantum computers
    ( 2023-02-20)
    Mücke, Sascha
    ;
    ;
    Müller, Sabine
    ;
    ;
    In machine learning, fewer features reduce model complexity. Carefully assessing the influence of each input feature on the model quality is therefore a crucial preprocessing step. We propose a novel feature selection algorithm based on a quadratic unconstrained binary optimization (QUBO) problem, which allows to select a specified number of features based on their importance and redundancy. In contrast to iterative or greedy methods, our direct approach yields higher-quality solutions. QUBO problems are particularly interesting because they can be solved on quantum hardware. To evaluate our proposed algorithm, we conduct a series of numerical experiments using a classical computer, a quantum gate computer, and a quantum annealer. Our evaluation compares our method to a range of standard methods on various benchmark data sets. We observe competitive performance.
  • Publication
    Informed Machine Learning - A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems
    Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.
  • Publication
    Quantum Feature Selection
    ( 2022-03-24)
    Mücke, Sascha
    ;
    ;
    Müller, Sabine
    ;
    ;
    In machine learning, fewer features reduce model complexity. Carefully assessing the influence of each input feature on the model quality is therefore a crucial preprocessing step. We propose a novel feature selection algorithm based on a quadratic unconstrained binary optimization (QUBO) problem, which allows to select a specified number of features based on their importance and redundancy. In contrast to iterative or greedy methods, our direct approach yields higher- quality solutions. QUBO problems are particularly interesting because they can be solved on quantum hardware. To evaluate our proposed algorithm, we conduct a series of numerical experiments using a classical computer, a quantum gate computer and a quantum annealer. Our evaluation compares our method to a range of standard methods on various benchmark datasets. We observe competitive performance.
  • Publication
    Benchmarking table recognition performance on biomedical literature on neurological disorders
    Table recognition systems are widely used to extract and structure quantitative information from the vast amount of documents that are increasingly available from different open sources. While many systems already perform well on tables with a simple layout, tables in the biomedical domain are often much more complex. Benchmark and training data for such tables are however very limited. To address this issue, we present a novel, highly curated benchmark dataset based on a hand-curated literature corpus on neurological disorders, which can be used to tune and evaluate table extraction applications for this challenging domain. We evaluate several state-of-the-art table extraction systems based on our proposed benchmark and discuss challenges that emerged during the benchmark creation as well as factors that can impact the performance of recognition methods. For the evaluation procedure, we propose a new metric as well as several improvements that result in a better performance evaluation. The resulting benchmark dataset (https://zenodo.org/record/5549977) as well as the source code to our novel evaluation approach can be openly accessed. Supplementary data are available at Bioinformatics online.
  • Publication
    A hybrid approach for identifying drug repurposing candidates and their mechanisms
    ( 2022)
    Lage-Rupprecht, Vanessa
    ;
    ;
    Senior researcher Vanessa Lage-Rupprecht and two collaborators talk about what data science means to them and illustrate how they managed to create a data and lab coexistence in their drug-repurposing project, which was recently published in Patterns. In this article, they have developed a drug-target-mechanism-oriented data model, Human Brain PHARMACOME, and have presented it as a resource to the community.
  • Publication
    A hybrid approach unveils drug repurposing candidates targeting an Alzheimer pathophysiology mechanism
    ( 2022)
    Lage-Rupprecht, Vanessa
    ;
    ;
    Dick, Justus
    ;
    ; ;
    Gebel, Stephan
    ;
    Pless, Ole
    ;
    Reinshagen, Jeanette
    ;
    ; ; ; ; ;
    The high number of failed pre-clinical and clinical studies for compounds targeting Alzheimer disease (AD) has demonstrated that there is a need to reassess existing strategies. Here, we pursue a holistic, mechanism-centric drug repurposing approach combining computational analytics and experimental screening data. Based on this integrative workflow, we identified 77 druggable modifiers of tau phosphorylation (pTau). One of the upstream modulators of pTau, HDAC6, was screened with 5,632 drugs in a tau-specific assay, resulting in the identification of 20 repurposing candidates. Four compounds and their known targets were found to have a link to AD-specific genes. Our approach can be applied to a variety of AD-associated pathophysiological mechanisms to identify more repurposing candidates.
  • Publication
    CLEP: A hybrid data- and knowledge-driven framework for generating patient representations
    ( 2021-05-08) ;
    Ali, Mehdi
    ;
    ; ; ; ;
    Hoyt, Charles Tapley
    ;
    Domingo-Fernández, Daniel
    As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.