Dorsch, ReneReneDorschFreund, MichaelMichaelFreundFries, JustusJustusFriesHarth, AndreasAndreasHarth2024-05-272024-05-272023https://publica.fraunhofer.de/handle/publica/4687902-s2.0-85187549451We present GraphGuard, a data validation framework to improve the data quality of pipelines to populate knowledge graphs. The inputs for these pipelines often come from different sources, requiring various approaches for validating the data against different defects. This requirement leads to different formats for validation reports, which reduces contextual, representational, and accessible quality dimensions of data validation. The proposed framework consists of QualityContracts and Guardians. QualityContracts encapsulate the necessary data validation requirements in both human and machine-readable formats. Software agents, called Guardians, use the machine-readable format to execute validation methods. We validate the practicality of our framework on a deployed data processing pipeline at a large European airport over several months of data. A comparative analysis between a basic data processing pipeline and a pipeline using our framework showed improvements in the data quality criteria of believability, interpretability, ease of understanding, consistency of representation, conciseness of representation, and accessibility.enData QualityData ValidationKnowledge GraphProcess OptimizationGraphGuard: Enhancing Data Quality in Knowledge Graph Pipelinesconference paper