Now showing 1 - 5 of 5
  • Publication
    CLEP: A hybrid data- and knowledge-driven framework for generating patient representations
    ( 2021-05-08) ;
    Ali, Mehdi
    ;
    ; ; ; ;
    Hoyt, Charles Tapley
    ;
    Domingo-Fernández, Daniel
    As machine learning and artificial intelligence increasingly attain a larger number of applications in the biomedical domain, at their core, their utility depends on the data used to train them. Due to the complexity and high dimensionality of biomedical data, there is a need for approaches that combine prior knowledge around known biological interactions with patient data. Here, we present CLinical Embedding of Patients (CLEP), a novel approach that generates new patient representations by leveraging both prior knowledge and patient-level data. First, given a patient-level dataset and a knowledge graph containing relations across features that can be mapped to the dataset, CLEP incorporates patients into the knowledge graph as new nodes connected to their most characteristic features. Next, CLEP employs knowledge graph embedding models to generate new patient representations that can ultimately be used for a variety of downstream tasks, ranging from clustering to classification. We demonstrate how using new patient representations generated by CLEP significantly improves performance in classifying between patients and healthy controls for a variety of machine learning models, as compared to the use of the original transcriptomics data. Furthermore, we also show how incorporating patients into a knowledge graph can foster the interpretation and identification of biological features characteristic of a specific disease or patient subgroup. Finally, we released CLEP as an open source Python package together with examples and documentation.
  • Publication
    Integration of Structured Biological Data Sources using Biological Expression Language
    ( 2019-05-08)
    Hoyt, Charles Tapley
    ;
    ; ;
    Llaó, Josep Marin
    ;
    Konotopez, Andrej
    ;
    ; ;
    Muslu, Özlem
    ;
    English, Bradley
    ;
    Müller, Simon
    ;
    Lacerda, Mauricio Pio De
    ;
    ;
    Colby, Scott
    ;
    Türei, Dénes
    ;
    Palacio-Escat, Nicolàs
    ;
    Background: The integration of heterogeneous, multiscale, and multimodal knowledge and data has become a common prerequisite for joint analysis to unravel the mechanisms and aetiologies of complex diseases. Because of its unique ability to capture this variety, Biological Expression Language (BEL) is well suited to be further used as a platform for semantic integration and harmonization in networks and systems biology. Results: We have developed numerous independent packages capable of downloading, structuring, and serializing various biological data sources to BEL. Each Bio2BEL package is implemented in the Python programming language and distributed through GitHub (https://github.com/bio2bel) and PyPI. Conclusions: The philosophy of Bio2BEL encourages reproducibility, accessibility, and democratization of biological databases. We present several applications of Bio2BEL packages including their ability to support the curation of pathway mappings, integration of pathway databases, and machine learning applications.
  • Publication
    RatVec: A General Approach for Low-dimensional Distributed Vector Representations via Rational Kernels
    ( 2019)
    Brito, Eduardo
    ;
    ;
    Domingo-Fernández, Daniel
    ;
    Hoyt, Charles Tapley
    ;
    We present a general framework, RatVec, for learning vector representations of non-numeric entities based on domain-specific similarity functions interpreted as rational kernels. We show competitive performance using k-nearest neighbors in the protein family classification task and in Dutch spelling correction. To promote re-usability and extensibility, we have made our code and pre-trained models available athttps://github.com/ratvec.
  • Publication
    The KEEN Universe. An Ecosystem for Knowledge Graph Embeddings with a Focus on Reproducibility and Transferability
    ( 2019) ;
    Jabeen, Hajira
    ;
    Hoyt, Charles Tapley
    ;
    There is an emerging trend of embedding knowledge graphs (KGs) in continuous vector spaces in order to use those for machine learning tasks. Recently, many knowledge graph embedding (KGE) models have been proposed that learn low dimensional representations while trying to maintain the structural properties of the KGs such as the similarity of nodes depending on their edges to other nodes. KGEs can be used to address tasks within KGs such as the prediction of novel links and the disambiguation of entities. They can also be used for downstream tasks like question answering and fact-checking. Overall, these tasks are relevant for the semantic web community. Despite their popularity, the reproducibility of KGE experiments and the transferability of proposed KGE models to research fields outside the machine learning community can be a major challenge. Therefore, we present the KEEN Universe, an ecosystem for knowledge graph embeddings that we have developed with a strong focus on reproducibility and transferability. The KEEN Universe currently consists of the Python packages PyKEEN (Python KnowlEdge EmbeddiNgs), BioKEEN (Biological KnowlEdge EmbeddiNgs), and the KEEN Model Zoo for sharing trained KGE models with the community.
  • Publication
    Predicting Missing Links Using PyKEEN
    ( 2019) ;
    Hoyt, Charles Tapley
    ;
    Domingo-Fernandez, Daniel
    ;
    PyKEEN is a framework, which integrates several approaches to compute knowledge graph embeddings (KGEs). We demonstrate the usage of PyKEEN in an biomedical use case, i.e. we trained and evaluated several KGE models on a biological knowledge graph containing genes annotations to pathways and pathway hierarchies from well-known databases. We used the best performing model to predict new links and present an evaluation in collaboration with a domain expert.