CC BYBalabin, HelenaHelenaBalabinHoyt, Charles TapleyCharles TapleyHoytGyori, BenjaminBenjaminGyoriBachman, JohnJohnBachmanTom Kodamullil, AlphaAlphaTom KodamullilHofmann-Apitius, MartinMartinHofmann-ApitiusDomingo Fernández, DanielDanielDomingo Fernández2022-05-232022-05-232022-01-10https://publica.fraunhofer.de/handle/publica/417902https://doi.org/10.24406/publica-7210.24406/publica-722-s2.0-85128937148While most approaches individually exploit unstructured data from the biomedical literature or structured data from biomedical knowledge graphs, their union can better exploit the advantages of such approaches, ultimately improving representations of biology. Using multimodal transformers for such purposes can improve performance on context dependent classification tasks, as demonstrated by our previous model, the Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs (STonKGs). In this work, we introduce ProtSTonKGs, a transformer aimed at learning all-encompassing representations of protein-protein interactions. ProtSTonKGs presents an extension to our previous work by adding textual protein descriptions and amino acid sequences (i.e., structural information) to the text- and knowledge graph-based input sequence used in STonKGs. We benchmark ProtSTonKGs against STonKGs, resulting in improved F1 scores by up to 0.066 (i.e., from 0.204 to 0.270) in several tasks such as predicting protein interactions in several contexts. Our work demonstrates how multimodal transformers can be used to integrate heterogeneous sources of information, paving the foundation for future approaches that use multiple modalities for biomedical applications.enNatural Language ProcessingKnowledge GraphsTransformersBioinformaticsMachine LearningDDC::000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::000 Informatik, Informationswissenschaft, allgemeine WerkeDDC::500 Naturwissenschaften und Mathematik::570 Biowissenschaften; Biologie::570 Biowissenschaften; BiologieDDC::600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und GesundheitProtSTonKGs: A Sophisticated Transformer Trained on Protein Sequences, Text, and Knowledge Graphsconference paper