Semantic Graph Queries on Linked Data in Knowledge Graphs
Knowledge graphs have been shown to play a central role in recent knowledge mining and discovery, big data integration, especially for connecting data from different domains. Bringing structured as well as unstructured data, e.g. from scientific literature and various data sources, into a structured, comparable format is one of the key assets. KGs are usually stored in graph databases. Although a lot of research has been done on the field of query optimization, query transformation and of course in storing and retrieving large scale knowledge graphs the field of algorithmic optimization is still a major challenge and a vital factor in using graph databases. Few researchers have addressed the problem of optimizing algorithms on large scale labeled property graphs. Here, we present two optimization approaches and compare them with a naive approach of directly querying the graph database. The aim of our work is to determine limiting factors of graph databases like Neo4j and we describe a novel solution to tackle these challenges. For this, we suggest a classification schema to differ between the complexity of a problem on a graph database. In addition, we propose several other applications for graph methods within the domain of digital humanities. Here, we show how the schema helps to understand the algorithmic challenges for semantic graph queries. We evaluate other optimization approaches on a test system containing a knowledge graph derived biomedical publication data enriched with text mining data. This dense graph has more than 71 M nodes and 850 M relationships. The results are very encouraging and-depending on the problem-we were able to show a speedup of a factor between 44 and 3839.