Now showing 1 - 7 of 7
No Thumbnail Available
Publication

Unsupervised duplicate detection using sample non-duplicates

2006 , Lehti, P. , Fankhauser, P.

The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Typical current methods require deep understanding of the application domain or a good representative training set, which entails significant costs. In this paper we present an unsupervised, domain independent approach to duplicate detection that starts with a broad alignment of potential duplicates, and analyses the distribution of observed similarity values among these potential duplicates and among representative sample non-duplicates to improve the initial alignment. Additionally, the presented approach is not only able to align flat records, but makes also use of related objects, which may significantly increase the alignment accuracy. Evaluations show that our approach supersedes other unsupervised approaches and reaches almost the same accuracy as even fully supervised, domain dependent approaches.

No Thumbnail Available
Publication

Skalierbare Verarbeitung von XML mit Infonyte-DB

2002 , Tesch, T. , Fankhauser, P. , Weitzel, T.

Die zunehmende Durchdringung von IT-Architekturen mit XML führt zu immer größeren XML-Datenvolumina. Diese lassen sich mit den zur Verfügung stehenden XML-Werkzeugen nicht immer skalierbar verarbeiten. Das Produkt Infonyte-DB der Infonyte GmbH ist ein modularer XML-Kernel, der sehr ressourcenschonend große XML-Datenvolumina effizient verarbeiten kann

No Thumbnail Available
Publication

IRO-DB: Making relational and object-oriented database systems interoperable

1996 , Fankhauser, P. , Finance, B. , Klas, W.

No Thumbnail Available
Publication

SWQL - A query language for data integration based on OWL

2005 , Lehti, P. , Fankhauser, P.

The Web Ontology Language OWL has been advocated as a suitable model for semantic data integration. Data integration requires expressive means to map between heterogeneous OWL schemas. This paper introduces SWQL (Semantic Web Query Language), a strictly typed query language for OWL, and shows how it can be used for mapping between heterogeneous schemas. In contrast to existing RDF query languages which focus on selection and navigation, SWQL also supports construction and user-defined functions to allow for instantiating integrated global schemas in OWL.

No Thumbnail Available
Publication

XQuery by the book: The IPSI XQuery Demonstrator

2002 , Fankhauser, P. , Groh, T. , Overhage, S.

The IPSI XQuery Demonstrator (IPSI-XQ) implements the XQuery surface syntax, its mapping to the XQuery Core Language, and the static and dynamic semantics of XQuery Core "by the book", following the formal specification as faithfully as possible. Its main purpose is to provide a framework for testing various language design options, and for experimenting with techniques to use type information for efficiently storing and querying XML.

No Thumbnail Available
Publication

XML for data warehousing chances and challenges

2003 , Fankhauser, P. , Klement, T.

No Thumbnail Available
Publication

Arbitration and Matchmaking for Agents with Conflicting Interests

1999 , Tesch, T. , Fankhauser, P.