• English
  • Deutsch
  • Log In
    Password Login
    or
  • Research Outputs
  • Projects
  • Researchers
  • Institutes
  • Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. Efficient entity resolution for large heterogeneous information spaces
 
  • Details
  • Full
Options
2011
Conference Paper
Titel

Efficient entity resolution for large heterogeneous information spaces

Abstract
We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledg e about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient. Copyright 2011 ACM.
Author(s)
Papadakis, G.
Loannou, E.
Niederée, C.
Fankhauser, P.
Hauptwerk
4th ACM International Conference on Web Search and Data Mining, WSDM 2011. Proceedings. CD-ROM
Konferenz
International Conference on Web Search and Data Mining (WSDM) 2011
Thumbnail Image
DOI
10.1145/1935826.1935903
Language
English
google-scholar
IPSI
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Send Feedback
© 2022