Optimizing SPARQL query processing on dynamic and static data based on query time/freshness requirements using materialization
To integrate various Linked Datasets, data warehousing and live query processing provide two extremes for optimized response time and quality respectively. The first approach provides very fast responses but with low-quality because changes of original data are not immediately reflected on materialized data. The second approach provides accurate responses but it is notorious for long response times. A hybrid SPARQL query processor provides a middle ground between two specified extremes by splitting the triple patterns of SPARQL query between live and local processors based on a predetermined coherence threshold specified by the administrator. Considering quality requirements while splitting the SPARQL query, enables the processor to eliminate the unnecessary live execution and releases resources for other queries. This requires estimating the quality of response provided with current materialized data, compare it with user requirements and determine the most selective sub-queries which can boost the response quality up to the specified level with least computational complexity. In this work, we propose solutions for estimating the freshness of materialized data, as one dimension of the quality, by extending cardinality estimation techniques. Experimental results show that we can estimate the freshness of materialized data with a low error rate.