• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
 
  • Details
  • Full
Options
2021
Conference Paper
Title

DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs

Abstract
In this paper, we present DistSim, a Scalable Distributed in-Memory Semantic Similarity Estimation framework for Knowledge Graphs. DistSim provides a multitude of state-of-the-art similarity estimators. We have developed the Similarity Estimation Pipeline by combining generic software modules. For large scale RDF data, DistSim proposes MinHash with locality sensitivity hashing to achieve better scalability over all-pair similarity estimations. The modules of DistSim can be set up using a multitude of (hyper)-parameters allowing to adjust the tradeoff between information taken into account, and processing time. Furthermore, the output of the Similarity Estimation Pipeline is native RDF. DistSim is integrated into the SANSA stack, documented in scala-docs, and covered by unit tests. Addition ally, the variables and provided methods follow the Apache Spark MLlib name-space conventions. The performance of DistSim was tested over a distributed cluster, for the dimensions of data set size and processing power versus processing time, which shows the scalability of DistSim w.r.t. increasing data set sizes and processing power. DistSim is already in use for solving several RDF data analytics related use cases. Additionally, DistSim is available and integrated into the open-source GitHub project SANSA.
Author(s)
Draschner, Carsten Felix
Lehmann, Jens  
Jabeen, Hajira
Mainwork
IEEE 15th International Conference on Semantic Computing, ICSC 2021. Proceedings  
Project(s)
PLATOON  
Funder
European Commission EC  
Conference
International Conference on Semantic Computing (ICSC) 2021  
Open Access
DOI
10.1109/ICSC50631.2021.00062
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • Distributed RDF Analytics

  • Scalable Semantic Similarity Estimation

  • Knowledge Graph Data Analytics Pipeline

  • SANSA

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024