Provenance management over linked data streams
Provenance describes how results are produced starting from data sources, curation, recovery, intermediate processing, to the final results. Provenance has been applied to solve many problems and in particular to understand how errors are propagated in large-scale environments such as Internet of Things, Smart Cities. In fact, in such environments operations on data are often performed by multiple uncoordinated parties, each potentially introducing or propagating errors. These errors cause uncertainty of the overall data analytics process that is further amplified when many data sources are combined and errors get propagated across multiple parties. The ability to properly identify how such errors influence the results is crucial to assess the quality of the results. This problem becomes even more challenging in the case of Linked Data Streams, where data is dynamic and often incomplete. In this paper, we introduce methods to compute provenance over Linked Data Streams. More specifically, we propose provenance management techniques to compute provenance of continuous queries executed over complete Linked Data streams. Unlike traditional provenance management techniques, which are applied on static data, we focus strictly on the dynamicity and heterogeneity of Linked Data streams. Specifically, in this paper we describe: i) means to deliver a dynamic provenance trace of the results to the user, ii) a system capable to execute queries over dynamic Linked Data and compute provenance of these queries, and iii) an empirical evaluation of our approach using real-world datasets.