Now showing 1 - 4 of 4
  • Publication
    Unsupervised duplicate detection using sample non-duplicates
    ( 2006)
    Lehti, P.
    ;
    Fankhauser, P.
    The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Typical current methods require deep understanding of the application domain or a good representative training set, which entails significant costs. In this paper we present an unsupervised, domain independent approach to duplicate detection that starts with a broad alignment of potential duplicates, and analyses the distribution of observed similarity values among these potential duplicates and among representative sample non-duplicates to improve the initial alignment. Additionally, the presented approach is not only able to align flat records, but makes also use of related objects, which may significantly increase the alignment accuracy. Evaluations show that our approach supersedes other unsupervised approaches and reaches almost the same accuracy as even fully supervised, domain dependent approaches.
  • Publication
    Database Integration Using the Open Object-Oriented Database System VODAK
    ( 1998)
    Klas, W.
    ;
    Fankhauser, P.
    ;
    Muth, P.
    ;
    Rakow, T.C.
    ;
    Neuhold, E.J.
  • Publication
    Error-tolerant document structure analysis
    ( 1998)
    Klein, B.
    ;
    Fankhauser, P.
  • Publication
    Database integration using the open object-oriented multidatabase system VODAK
    ( 1996)
    Klas, W.
    ;
    Fankhauser, P.
    ;
    Muth, P.
    ;
    Rakow, T.C.
    ;
    Neuhold, E.J.