Constraint driven schema merging

Li, X.

2012

Doctoral Thesis

Abstract

Schema integration is the process of consolidating several source schemas to generate a unified view, called the mediated schema, so that information scattered in the sources can be served uniformly from the mediated schema. Schema integration occurs in many scenarios such as data integration, logical database design, data warehousing and schema evolution. To make the mediated schema useful for data interoperability tasks, mappings between the source schemas and the mediated schema have to be derived. Previous approaches fall short in two aspects. First, the identification of inter-schema relationships (i.e., schema matching) is usually mixed with the process of combining and restructuring schemas (i.e., schema merging). The coupling of schema matching and schema merging results in increased complexities and human interventions in the schema integration process. Second, the schema mappings are either conceptual alignments between entity types or syntactical correspondences between attributes. Neither of the two mapping languages is able to express complex relationships among several modeling constructs. Logical schema mappings in the form of data dependencies are able express such complex relationships but are less explored for schema merging. In this thesis, we propose a new approach to schema merging using logical schema mappings, more specifically tuple-generating dependencies(tgds) and equality-generating dependencies (egds). We provide well founded semantics of schema merging under two scenarios: view integration and data integration. Based on the formal characterization of the schema merging problem, we develop a schema minimization approach which generates minimal mediated schemas with the same query answering capacity as the source schemas. We study the complexity of the proposed algorithms and show that the schema minimization problems are intractable in the general case. However, we have identified syntactical constraints on the input mappings which ensure that the proposed algorithms are in PTIME. In addition, we have implemented the schema merging algorithms in a prototype. The evaluation on real world and synthetic data sets shows the applicability and scalability of the approach.

ThesisNote

Aachen, TH, Diss., 2012

Author(s)

Li, X.

Verlagsort

Aachen

Options

Constraint driven schema merging