Fraunhofer Institut für Integrierte Publikations-und Informationssysteme IPSI
Now showing 1 - 10 of 1310
PublicationEfficient entity resolution for large heterogeneous information spaces( 2011)
;Papadakis, G. ;Loannou, E. ;Niederée, C.Fankhauser, P.We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledg e about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient. Copyright 2011 ACM.
PublicationDivQ: Diversification for keyword search over structured databases( 2010)
;Demidova, E. ;Fankhauser, P. ;Zhou, X.Nejdl, W.Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance and novelty of keyword search results over structured databases. Firstly, we present a probabilistic model which effectively ranks the possible interpretations of a keyword query over structured data. Then, we introduce a scheme to diversify the search results by re-ranking query interpretations, taking into account redundancy of query results. Finally, we propose -nDCG-W and WS-recall, an adaptation of -nDCG and S-recall metrics, taking into account graded relevance of subtopics. Our evaluation on two real-world datasets demonstrates that search results obtained using the proposed diversification algorithms better characterize possible answers available in the database than the results of the initial relevance ranking.
PublicationThe missing links: Discovering hidden same-as links among a billion of triples( 2010)
;Papadakis, G. ;Demartini, G. ;Fankhauser, P.Kärger, P.The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated and linked among different sources, instead of being isolated in data silos. In order to materialize this vision of a web of semantics, existing resource identifiers should be reused and shared between different Web sites. This is not always the case with the current state of the Semantic Web, since multiple identifiers are, more often than not, redundantly introduced for the same resources. In this paper we introduce a novel approach to automatically detect redundant identifiers solely by matching the URIs of information resources. The approach, based on a common pattern among Semantic Web URIs, provides a simple and practical method for duplicate detection. We apply this method on a large snapshot of the current Semantic Web comprising 1.15 billion statements and estimate the number of hidden duplicates in it. The outcomes of our experiments confirm the effectiveness as well as the efficiency of our method, and suggest that URI matching can be used as a scalable filter for discovering implicit same-as links.
PublicationLanguage models & topic models for personalizing tag recommendation( 2010)
;Krestel, R.Fankhauser, P.More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag recommendation algorithms have emerged. In this setting, not only the content is decisive for recommending relevant tags but also the user's preferences. In this paper we introduce an approach to personalized tag recommendation that combines a probabilistic model of tags from the resource with tags from the user. As models we investigate simple language models as well as Latent Dirichlet Allocation. Extensive experiments on a real world dataset crawled from a big tagging system show that personalization improves tag recommendation, and our approach significantly outperforms st ate-of-the-art approaches.
PublicationFrom cognitive compatibility to the disappearing computer: Experience design for smart environments( 2008)Streitz, N.The objective of this keynote talk is to present selected visions of ambient and ubiquitous computing based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. It is a human-centred approach exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. In this approach, the computer "disappears" and is almost "invisible" but its functionality is ubiquitously available and provides new forms of interaction, communication and collaboration. In summary: the world around us is the interface to information and for conveying experiences.
PublicationIt's all in the (ambient) environment: Designing experiences in ubiquitous hybrid worlds( 2008)Streitz, N.A.The objective of this invited talk is to present selected visions of ubiquitous computing and ambient communication based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. Our approach places the human at the centre of our design considerations and is based on exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. Combining the best of both worlds requires an integration of real and virtual worlds resulting in hybrid worlds. In this approach, the computer "disappears" and is almost " invisible" but its functionality is ubiquitously available and provides new forms of interacting with information. The approach can be summarized by the statement that "the world around us is the interface to information and for conveying experiences".
PublicationContext-oriented communication and the design of computer-supported discursive learning( 2008)
;Herrmann, T.Kienle, A.Computer-supported discursive learning (CSDL) systems for the support of asynchronous discursive learning need to fulfil specific socio-technical conditions. To understand these conditions, we employed design experiments combining aspects of communication theory, empirical findings, and continuous improvement of the investigated prototypes. Our theoretical perspective starts with a context-oriented model of communication which is-as a result of the experiments-extended by including the role of a third-party such as a facilitator. The theory-driven initial design requirements lead to the CSCL-prototype, KOLUMBUS, emphasizing the role of annotations. In KOLUMBUS, annotations can be immediately embedded in their context of learning material. Practical experience with the prototype in five cases reveals possibilities for implementing improvements and observing their impact. On this basis, we provide guidelines for the design of CSDL systems that focus on the support of asyn chronous discursive learning.
PublicationNew ERCIM Working Group on "Smart Environments and Systems for Ambient Intelligence"( 2007)
;Savidis, A.Streitz, N.
PublicationDigitale Wasserzeichen in eHealth-Anwendungen als Schutzmechanismus für Multimedia-Dateien( 2007)
;Steinebach, M. ;Croce-Ferri, L.Pharow, P.
PublicationSocial radio - a music-based approach to emotional awareness mediation( 2007)
;Röcker, C.Etter, R.