• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. STonKGs: A sophisticated transformer trained on biomedical text and knowledge graphs
 
  • Details
  • Full
Options
2022
Journal Article
Title

STonKGs: A sophisticated transformer trained on biomedical text and knowledge graphs

Abstract
Motivation The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models. However, representations based on a single modality are inherently limited. Results To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e. text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e. from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer l earning applications. Availability and implementation We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs.
Author(s)
Balabin, Helena
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Hoyt Tapley, Charles
Harvard University
Birkenbihl, Colin  
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Gyori, Benjamin
Harvard University
Bachman, John
Harvard University
Kodamullil, Alpha
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Plöger, Paul
Bonn-Rhein-Sieg University of Applied Sciences
Hofmann-Apitius, Martin  
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Domingo-Fernández, Daniel
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Journal
Bioinformatics  
Open Access
File(s)
Download (1.78 MB)
Rights
CC BY 4.0: Creative Commons Attribution
DOI
10.24406/publica-r-271462
10.1093/bioinformatics/btac001
Language
English
Fraunhofer-Institut für Algorithmen und Wissenschaftliches Rechnen SCAI  
Keyword(s)
  • machine learning

  • text-mining

  • bioinformatic

  • artificial intelligence

  • transformers

  • knowledge-graphs

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024