Now showing 1 - 10 of 691
  • Publication
    Efficient entity resolution for large heterogeneous information spaces
    ( 2011)
    Papadakis, G.
    ;
    Loannou, E.
    ;
    Niederée, C.
    ;
    Fankhauser, P.
    We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledg e about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient. Copyright 2011 ACM.
  • Publication
    Language models & topic models for personalizing tag recommendation
    ( 2010)
    Krestel, R.
    ;
    Fankhauser, P.
    More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag recommendation algorithms have emerged. In this setting, not only the content is decisive for recommending relevant tags but also the user's preferences. In this paper we introduce an approach to personalized tag recommendation that combines a probabilistic model of tags from the resource with tags from the user. As models we investigate simple language models as well as Latent Dirichlet Allocation. Extensive experiments on a real world dataset crawled from a big tagging system show that personalization improves tag recommendation, and our approach significantly outperforms st ate-of-the-art approaches.
  • Publication
    The missing links: Discovering hidden same-as links among a billion of triples
    ( 2010)
    Papadakis, G.
    ;
    Demartini, G.
    ;
    Fankhauser, P.
    ;
    Kärger, P.
    The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated and linked among different sources, instead of being isolated in data silos. In order to materialize this vision of a web of semantics, existing resource identifiers should be reused and shared between different Web sites. This is not always the case with the current state of the Semantic Web, since multiple identifiers are, more often than not, redundantly introduced for the same resources. In this paper we introduce a novel approach to automatically detect redundant identifiers solely by matching the URIs of information resources. The approach, based on a common pattern among Semantic Web URIs, provides a simple and practical method for duplicate detection. We apply this method on a large snapshot of the current Semantic Web comprising 1.15 billion statements and estimate the number of hidden duplicates in it. The outcomes of our experiments confirm the effectiveness as well as the efficiency of our method, and suggest that URI matching can be used as a scalable filter for discovering implicit same-as links.
  • Publication
    DivQ: Diversification for keyword search over structured databases
    ( 2010)
    Demidova, E.
    ;
    Fankhauser, P.
    ;
    Zhou, X.
    ;
    Nejdl, W.
    Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance and novelty of keyword search results over structured databases. Firstly, we present a probabilistic model which effectively ranks the possible interpretations of a keyword query over structured data. Then, we introduce a scheme to diversify the search results by re-ranking query interpretations, taking into account redundancy of query results. Finally, we propose -nDCG-W and WS-recall, an adaptation of -nDCG and S-recall metrics, taking into account graded relevance of subtopics. Our evaluation on two real-world datasets demonstrates that search results obtained using the proposed diversification algorithms better characterize possible answers available in the database than the results of the initial relevance ranking.
  • Publication
    From cognitive compatibility to the disappearing computer: Experience design for smart environments
    ( 2008)
    Streitz, N.
    The objective of this keynote talk is to present selected visions of ambient and ubiquitous computing based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. It is a human-centred approach exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. In this approach, the computer "disappears" and is almost "invisible" but its functionality is ubiquitously available and provides new forms of interaction, communication and collaboration. In summary: the world around us is the interface to information and for conveying experiences.
  • Publication
    It's all in the (ambient) environment: Designing experiences in ubiquitous hybrid worlds
    ( 2008)
    Streitz, N.A.
    The objective of this invited talk is to present selected visions of ubiquitous computing and ambient communication based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. Our approach places the human at the centre of our design considerations and is based on exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. Combining the best of both worlds requires an integration of real and virtual worlds resulting in hybrid worlds. In this approach, the computer "disappears" and is almost " invisible" but its functionality is ubiquitously available and provides new forms of interacting with information. The approach can be summarized by the statement that "the world around us is the interface to information and for conveying experiences".
  • Publication
  • Publication
    A service architecture for context awareness and reaction provisioning
    ( 2007)
    Silva Santos, L.O.B. da
    ;
    Ramparany, F.
    ;
    Costa, P.D.
    ;
    Vink, P.
    ;
    Etter, R.
    ;
    Broens, T.
    Context awareness has emerged as an important element in distributed computing. It offers mechanisms allowing applications to be aware of their environment and enabling them to adjust their behavior to the current context. In order to keep track of the relevant context information, a flexible service mechanism should be available for the client applications. In this paper we present a service architecture to provide context-awareness capabilities to users and client applications. Moreover, the service is able to react depending on the user's preferences and context. The conditions for the reaction and the reaction itself are defined in rules the users submit to the service by means of a convenient rule language.