Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

SJoin: A semantic join operator to integrate heterogeneous RDF graphs

: Galkin, M.; Collarana, D.; Traverso-Ribón, I.; Vidal, M.-E.; Auer, S.


Benslimane, D.:
Database and expert systems applications. 28th international conference, DEXA 2017. Vol.1 : Lyon, France, August 28-31, 2017; Proceedings
Cham: Springer International Publishing, 2017 (Lecture Notes in Computer Science 10438)
ISBN: 978-3-319-64467-7 (Print)
ISBN: 978-3-319-64468-4 (Online)
International Conference on Database and Expert Systems Applications (DEXA) <28, 2017, Lyon>
Conference Paper
Fraunhofer IAIS ()

Semi-structured data models like the Resource Description Framework (RDF), naturally allow for modeling the same real-world entity in various ways. For example, different RDF vocabularies enable the definition of various RDF graphs representing the same drug in Bio2RDF or Drugbank. Albeit semantically equivalent, these RDF graphs may be syntactically different, i.e., they have distinctive graph structure or entity identifiers and properties. Existing data-driven integration approaches only consider syntactic matching criteria or similarity measures to solve the problem of integrating RDF graphs. However, syntactic-based approaches are unable to semantically integrate heterogeneous RDF graphs. We devise SJoin, a semantic similarity join operator to solve the problem of matching semantically equivalent RDF graphs, i.e., syntactically different graphs corresponding to the same real-world entity. Two physical implementations are proposed for SJoin which follow blocking or n on-blocking data processing strategies, i.e., RDF graphs can be merged in a batch or incrementally. We empirically evaluate the effectiveness and efficiency of the SJoin physical operators with respect to baseline similarity join algorithms. Experimental results suggest that SJoin outperforms baseline approaches, i.e., non-blocking SJoin incrementally produces results faster, while the blocking SJoin accurately matches all semantically equivalent RDF graphs.