Now showing 1 - 10 of 298
  • Publication
    Anonymization of German financial documents using neural network-based language models with contextual word representations
    The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.
  • Publication
    Some applications of heat flow to Laplace eigenfunctions
    ( 2022) ;
    Mukherjee, M.
    We consider mass concentration properties of Laplace eigenfunctions fl, that is, smooth functions satisfying the equation -Dfl=lfl, on a smooth closed Riemannian manifold. Using a heat diffusion technique, we first discuss mass concentration/localization properties of eigenfunctions around their nodal sets. Second, we discuss the problem of avoided crossings and (non)existence of nodal domains which continue to be thin over relatively long distances. Further, using the above techniques, we discuss the decay of Laplace eigenfunctions on Euclidean domains which have a central "thick" part and "thin" elongated branches representing tunnels of sub-wavelength opening. Finally, in an Appendix, we record some new observations regarding sub-level sets of the eigenfunctions and interactions of different level sets.
  • Publication
    Benchmarking table recognition performance on biomedical literature on neurological disorders
    Table recognition systems are widely used to extract and structure quantitative information from the vast amount of documents that are increasingly available from different open sources. While many systems already perform well on tables with a simple layout, tables in the biomedical domain are often much more complex. Benchmark and training data for such tables are however very limited. To address this issue, we present a novel, highly curated benchmark dataset based on a hand-curated literature corpus on neurological disorders, which can be used to tune and evaluate table extraction applications for this challenging domain. We evaluate several state-of-the-art table extraction systems based on our proposed benchmark and discuss challenges that emerged during the benchmark creation as well as factors that can impact the performance of recognition methods. For the evaluation procedure, we propose a new metric as well as several improvements that result in a better performance evaluation. The resulting benchmark dataset (https://zenodo.org/record/5549977) as well as the source code to our novel evaluation approach can be openly accessed. Supplementary data are available at Bioinformatics online.
  • Publication
    Bringing Light Into the Dark: A large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework
    ( 2022) ;
    Berrendorf, Max
    ;
    Hoyt, Charles Tapley
    ;
    Vermue, Laurent
    ;
    ;
    Sharifzadeh, Sahand
    ;
    Fischer, Asja
    ;
    Tresp, Volker
    ;
    The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult. To assess the reproducibility of previously published results, we re-implemented and evaluated 21 models in the PyKEEN software package. In this paper, we outline which results could be reproduced with their reported hyper-parameters, which could only be reproduced with alternate hyper-parameters, and which could not be reproduced at all, as well as provide insight as to why this might be the case. We then performed a large-scale benchmarking on four datasets with several thousands of experiments and 24,804 GPU hours of computation time. We present insights gained as to best practices, best configurations for each model, and w here improvements could be made over previously published best configurations. Our results highlight that the combination of model architecture, training approach, loss function, and the explicit modeling of inverse relations is crucial for a model's performance and is not only determined by its architecture. We provide evidence that several architectures can obtain results competitive to the state of the art when configured carefully. We have made all code, experimental configurations, results, and analyses available at https://github.com/pykeen/pykeen and https://github.com/pykeen/benchmarking.
  • Publication
    A hybrid approach unveils drug repurposing candidates targeting an Alzheimer pathophysiology mechanism
    ( 2022)
    Lage-Rupprecht, Vanessa
    ;
    ;
    Dick, Justus
    ;
    ; ;
    Gebel, Stephan
    ;
    Pless, Ole
    ;
    Reinshagen, Jeanette
    ;
    ; ; ; ; ;
    The high number of failed pre-clinical and clinical studies for compounds targeting Alzheimer disease (AD) has demonstrated that there is a need to reassess existing strategies. Here, we pursue a holistic, mechanism-centric drug repurposing approach combining computational analytics and experimental screening data. Based on this integrative workflow, we identified 77 druggable modifiers of tau phosphorylation (pTau). One of the upstream modulators of pTau, HDAC6, was screened with 5,632 drugs in a tau-specific assay, resulting in the identification of 20 repurposing candidates. Four compounds and their known targets were found to have a link to AD-specific genes. Our approach can be applied to a variety of AD-associated pathophysiological mechanisms to identify more repurposing candidates.
  • Publication
    Vom Textgenerator zum digitalen Experten
    Neue Sprachprogramme wie GPT-3 geben Maschinen nicht nur ein menschenähnliches Sprachgefühl, sondern sollen sie zugleich zu Fachleuten machen können. Was steckt dahinter? Und kann das gelingen?
  • Publication
    CLEP: A hybrid data- and knowledge-driven framework for generating patient representations
    ( 2021-05-08) ;
    Ali, Mehdi
    ;
    ; ; ; ;
    Hoyt, Charles Tapley
    ;
    Domingo-Fernández, Daniel
    As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.
  • Publication
    PyKEEN 1.0: A python library for training and evaluating knowledge graph embeddings
    ( 2021) ;
    Berrendorf, Max
    ;
    Hoyt, Charles Tapley
    ;
    Vermue, Laurent
    ;
    Sharifzadeh, Sahand
    ;
    Tresp, Volker
    ;
    Recently, knowledge graph embeddings (KGEs) have received significant attention, and several software libraries have been developed for training and evaluation. While each of them addresses specific needs, we report on a community effort to a re-design and re-implementation of PyKEEN, one of the early KGE libraries. PyKEEN 1.0 enables users to compose knowledge graph embedding models based on a wide range of interaction models, training approaches, loss functions, and permits the explicit modeling of inverse relations. It allows users to measure each componentâs influence individually on the modelâs performance. Besides, an automatic memory optimization has been realized in order to optimally exploit the provided hardware. Through the integration of Optuna, extensive hyper-parameter optimization (HPO) functionalities are provided.
  • Publication
    Automating and utilising equal-distribution data classification
    ( 2021)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Kureshi, I.
    ;
    Lee, K.
    ;
    Smith, I.
    ;
    Staykova, T.
    Data classification, i.e. organising data items in groups (classes), is a general technique widely used in data visualisation and cartography, in particular, for creation of choropleth maps. Conventionally, data are classified by dividing the data range into intervals and assigning the same symbol or colour to all data falling within an interval. For instance, the intervals may be of the same length or may include the same number of data items. We propose a method for defining intervals so that some quantity represented by values of another attribute is equally distributed among the classes. This kind of classification supports exploratory analysis of relationships between the attribute used for the classification and the distribution of the phenomenon whose quantity is represented by the additional attribute. The approach may be especially useful when the distribution of the phenomenon is very unequal, with many data items having zero or low quantities and quite a few items having larger quantities. With such a distribution, standard statistical analysis of the relationships may be problematic. We demonstrate the potential of the approach by analysing data referring to a set of spatially distributed people (patients) in relationship to characteristics of the areas in which the people live.
  • Publication
    Author Correction: Replication and Refinement of an Algorithm for Automated Drusen Segmentation on Optical Coherence Tomography
    ( 2021)
    Wintergerst, M.W.M.
    ;
    Gorgi Zadeh, S.
    ;
    Wiens, Vitalis
    ;
    Thiele, S.
    ;
    Schmitz-Valckenberg, S.
    ;
    Holz, F.G.
    ;
    Finger, R.P.
    ;
    Schultz, T.
    The Acknowledgements section in the original version of this Article was incomplete. ""This research was supported by the Else Kröner-Fresenius Foundation/German Scholars Organization (EKFS/GSO 16) to RF, the BONFOR GEROK Program, Faculty of Medicine, University of Bonn, (Grant No. O-137.0028) to MW, the GEROK Program, Faculty of Medicine, University of Bonn, (Grant No. O-137.0026) to ST and the German Ministry of Education and Research (BMBF), FKZ 13N10349. The funders had no role in study design, data collection, data analysis, data interpretation, or writing of the report."" now reads: ""This research was supported by the Else Kröner-Fresenius Foundation/German Scholars Organization (EKFS/GSO 16) to RF, the BONFOR GEROK Program, Faculty of Medicine, University of Bonn, (Grant No. O-137.0028) to MW, the GEROK Program, Faculty of Medicine, University of Bonn, (Grant No. O-137.0026) to ST and the German Ministry of Education and Research (BMBF), FKZ 13N10349. The work of Shekoufeh Gorgi Zadeh was supported by a grant from Deutsche Forschungsgemeinschaft (DFG), grant number SCHM 2966/2-1. The funders had no role in study design, data collection, data analysis, data interpretation, or writing of the report."" The original Article has been corrected.