Luzzu - A methodology and framework for linked data quality assessment
The increasing variety of Linked Data on the Web makes it challenging to determine the quality of this data and, subsequently, to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data Quality, the output of such tools is not suitable for machine consumption, and thus consumers can hardly compare and rank datasets in the order of fitness for use. This article describes a conceptual methodology for assessing Linked Datasets, and Luzzu; aframework for Linked Data Quality Assessment. Luzzu is based on four major components: (1) an extensible interface for defining new quality metrics; (2) an interoperable, ontology-driven back-end for representing quality metadata and quality problems that can be re-used within different semantic frameworks; (3) scalable dataset processors for data dumps, SPARQL endpoints, and big data infrastructures; and (4) a customisable ranking algorithm taking into account user-defined weights. We show that Luzzu scales linearly against the number of triples in a dataset. We also demonstrate the applicability of the Luzzu framework by evaluating and analysing a number of statistical datasets against a variety of metrics. This article contributes towards the definition of a holistic data quality lifecycle, in terms of the co-evolution of linked datasets, with the final aim of improving their quality.