Querying Interlinked Data by Bridging RDF Molecule Templates
Linked Data initiatives have encouraged the publication of a large number of RDF datasets created by different data providers independently. These datasets can be accessed using different Web interfaces, e.g., SPARQL endpoint; however, federated query engines are still required in order to provide an integrated view of these datasets. Given the large number of Web accessible RDF datasets, SPARQL federated query engines implement query processing techniques to effectively select the relevant datasets that provide the data required to answer a query. Existing federated query engines usually utilize coarse-grained description methods where datasets are characterized based on their vocabularies or schema, and details about data in the dataset are ignored, e.g., classes, properties, or relations. This lack of source description may lead to the erroneous selection of data sources for a query, and unnecessary retrieval of data and source communication, affecting thus the performance of query processing over the federation. We address the problem of federated SPARQL query processing and devise MULDER, a query engine for federations of RDF data sources. MULDER describes data sources in terms of an abstract description of entities belonging to the same RDF class, dubbed as an RDF molecule template, and utilizes them for source selection, and query decomposition and optimization. We empirically study the performance and continuous efficiency of MULDER on existing benchmarks, and compare with respect to existing federated SPARQL query engines. The experimental results suggest that RDF molecule templates empower MULDER, and allow for selection of RDF data sources that not only reduce execution time, but also increase answer completeness and continuous efficiency of MULDER.