Fraunhofer Institut für Integrierte Publikations-und Informationssysteme IPSI
Now showing 1 - 10 of 1310
PublicationEfficient entity resolution for large heterogeneous information spaces( 2011)
;Papadakis, G. ;Loannou, E. ;Niederée, C.Fankhauser, P.We have recently witnessed an enormous growth in the volume of structured and semi-structured data sets available on the Web. An important prerequisite for using and combining such data sets is the detection and merge of information that describes the same real-world entities, a task known as Entity Resolution. To make this quadratic task efficient, blocking techniques are typically employed. However, the high dynamics, loose schema binding, and heterogeneity of (semi-)structured data, impose new challenges to entity resolution. Existing blocking approaches become inapplicable because they rely on the homogeneity of the considered data and a-priory known schemata. In this paper, we introduce a novel approach for entity resolution, scaling it up for large, noisy, and heterogeneous information spaces. It combines an attribute-agnostic mechanism for building blocks with intelligent block processing techniques that boost blocks with high expected utility, propagate knowledg e about identified matches, and preempt the resolution process when it gets too expensive. Our extensive evaluation on real-world, large, heterogeneous data sets verifies that the suggested approach is both effective and efficient. Copyright 2011 ACM.
PublicationDivQ: Diversification for keyword search over structured databases( 2010)
;Demidova, E. ;Fankhauser, P. ;Zhou, X.Nejdl, W.Keyword queries over structured databases are notoriously ambiguous. No single interpretation of a keyword query can satisfy all users, and multiple interpretations may yield overlapping results. This paper proposes a scheme to balance the relevance and novelty of keyword search results over structured databases. Firstly, we present a probabilistic model which effectively ranks the possible interpretations of a keyword query over structured data. Then, we introduce a scheme to diversify the search results by re-ranking query interpretations, taking into account redundancy of query results. Finally, we propose -nDCG-W and WS-recall, an adaptation of -nDCG and S-recall metrics, taking into account graded relevance of subtopics. Our evaluation on two real-world datasets demonstrates that search results obtained using the proposed diversification algorithms better characterize possible answers available in the database than the results of the initial relevance ranking.
PublicationThe missing links: Discovering hidden same-as links among a billion of triples( 2010)
;Papadakis, G. ;Demartini, G. ;Fankhauser, P.Kärger, P.The Semantic Web is constantly gaining momentum, as more and more Web sites and content providers adopt its principles. At the core of these principles lies the Linked Data movement, which demands that data on the Web shall be annotated and linked among different sources, instead of being isolated in data silos. In order to materialize this vision of a web of semantics, existing resource identifiers should be reused and shared between different Web sites. This is not always the case with the current state of the Semantic Web, since multiple identifiers are, more often than not, redundantly introduced for the same resources. In this paper we introduce a novel approach to automatically detect redundant identifiers solely by matching the URIs of information resources. The approach, based on a common pattern among Semantic Web URIs, provides a simple and practical method for duplicate detection. We apply this method on a large snapshot of the current Semantic Web comprising 1.15 billion statements and estimate the number of hidden duplicates in it. The outcomes of our experiments confirm the effectiveness as well as the efficiency of our method, and suggest that URI matching can be used as a scalable filter for discovering implicit same-as links.
PublicationLanguage models & topic models for personalizing tag recommendation( 2010)
;Krestel, R.Fankhauser, P.More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag recommendation algorithms have emerged. In this setting, not only the content is decisive for recommending relevant tags but also the user's preferences. In this paper we introduce an approach to personalized tag recommendation that combines a probabilistic model of tags from the resource with tags from the user. As models we investigate simple language models as well as Latent Dirichlet Allocation. Extensive experiments on a real world dataset crawled from a big tagging system show that personalization improves tag recommendation, and our approach significantly outperforms st ate-of-the-art approaches.
PublicationIt's all in the (ambient) environment: Designing experiences in ubiquitous hybrid worlds( 2008)Streitz, N.A.The objective of this invited talk is to present selected visions of ubiquitous computing and ambient communication based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. Our approach places the human at the centre of our design considerations and is based on exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. Combining the best of both worlds requires an integration of real and virtual worlds resulting in hybrid worlds. In this approach, the computer "disappears" and is almost " invisible" but its functionality is ubiquitously available and provides new forms of interacting with information. The approach can be summarized by the statement that "the world around us is the interface to information and for conveying experiences".
PublicationContext-oriented communication and the design of computer-supported discursive learning( 2008)
;Herrmann, T.Kienle, A.Computer-supported discursive learning (CSDL) systems for the support of asynchronous discursive learning need to fulfil specific socio-technical conditions. To understand these conditions, we employed design experiments combining aspects of communication theory, empirical findings, and continuous improvement of the investigated prototypes. Our theoretical perspective starts with a context-oriented model of communication which is-as a result of the experiments-extended by including the role of a third-party such as a facilitator. The theory-driven initial design requirements lead to the CSCL-prototype, KOLUMBUS, emphasizing the role of annotations. In KOLUMBUS, annotations can be immediately embedded in their context of learning material. Practical experience with the prototype in five cases reveals possibilities for implementing improvements and observing their impact. On this basis, we provide guidelines for the design of CSDL systems that focus on the support of asyn chronous discursive learning.
PublicationFrom cognitive compatibility to the disappearing computer: Experience design for smart environments( 2008)Streitz, N.The objective of this keynote talk is to present selected visions of ambient and ubiquitous computing based on the notion of the disappearing computer and to reflect on the resulting challenges for designing experiences in future smart environments. It is a human-centred approach exploiting the affordances of real objects by augmenting their physical properties with the potential of computer-based support. In this approach, the computer "disappears" and is almost "invisible" but its functionality is ubiquitously available and provides new forms of interaction, communication and collaboration. In summary: the world around us is the interface to information and for conveying experiences.
PublicationA service architecture for context awareness and reaction provisioning( 2007)
;Silva Santos, L.O.B. da ;Ramparany, F. ;Costa, P.D. ;Vink, P. ;Etter, R.Broens, T.Context awareness has emerged as an important element in distributed computing. It offers mechanisms allowing applications to be aware of their environment and enabling them to adjust their behavior to the current context. In order to keep track of the relevant context information, a flexible service mechanism should be available for the client applications. In this paper we present a service architecture to provide context-awareness capabilities to users and client applications. Moreover, the service is able to react depending on the user's preferences and context. The conditions for the reaction and the reaction itself are defined in rules the users submit to the service by means of a convenient rule language.
PublicationInterdisciplinarity in the CSCL community - An empirical study( 2007)
;Wessner, M.Kienle, A.In previous work the CSCL community was analysed with respect to its scope, development, continuity and connectivity (Hoadley 2005, Kienle & Wessner 2005, Kienle & Wessner 2006). Main insights included a relatively low but stable continuity of individuals in the community, increasing international participation and increasing connectivity across different countries. Concerning the disciplines involved in CSCL and the disciplinary backgrounds of CSCL community members it was found that a variety of disciplines are represented in the community. A detailed analysis of the way these disciplines contribute to the progress of CSCL, the way members with different disciplinary backgrounds collaborate is still missing. In this paper we report an analysis of the CSCL community with respect to the disciplinary background of its members and the interrelation of various disciplines in CSCL. The analysis is based on a survey among members of the CSCL community actively involved in th e CSCL 2007 conference (reviewers and authors of accepted contributions). The paper reports and discusses main results of this analysis with respect to disciplinary background of CSCL community members as well as links between the disciplines. In addition it provides insights into motives for interdisciplinary collaboration, beneficial and hindering factors. The results should help to sharpen our view of the CSCL community, contribute to a shared understanding about what CSCL (currently) is (and what is it not) and point out perspectives for future development of the CSCL community.
PublicationDigitale Wasserzeichen in eHealth-Anwendungen als Schutzmechanismus für Multimedia-Dateien( 2007)
;Steinebach, M. ;Croce-Ferri, L.Pharow, P.