Now showing 1 - 10 of 123
  • Publication
    Unsupervised duplicate detection using sample non-duplicates
    ( 2006)
    Lehti, P.
    ;
    Fankhauser, P.
    The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Typical current methods require deep understanding of the application domain or a good representative training set, which entails significant costs. In this paper we present an unsupervised, domain independent approach to duplicate detection that starts with a broad alignment of potential duplicates, and analyses the distribution of observed similarity values among these potential duplicates and among representative sample non-duplicates to improve the initial alignment. Additionally, the presented approach is not only able to align flat records, but makes also use of related objects, which may significantly increase the alignment accuracy. Evaluations show that our approach supersedes other unsupervised approaches and reaches almost the same accuracy as even fully supervised, domain dependent approaches.
  • Publication
    Secure production of digital media
    ( 2005)
    Steinebach, M.C.
    ;
    Dittmann, J.
    Today more and more media data is produced completely in the digital domain without the need of analogue input. This brings an increase of flexibility and efficiency in media handling, as distributed access, duplication and modification are possible without the need to move or touch physical data carriers. But this also reduces the security of the process: Without physical originals to refer to, changes in the material can remain unnoticed, at the end making the manipulated data the new original. Theft and illegal copies in the digital domain can happen without notice and loss of quality. We therefore see the need of setting up secure media production environments, where access control, integrity and copyright protection as well as traceability of individual copies are enabled. Addressing this need, we design a framework for media production environments, where mechanisms like encryption, digital signatures and digital watermarking help to enable a flexible yet secure handling and processing of the content.
  • Publication
    Towards supporting annotation for existing web pages enabling hyperstructure-based searching
    ( 2005)
    Qiu, Z.
    ;
    Hemmje, M.
    This paper discusses the requirements and tasks in annotating existing Web pages with additional structural and semantic information. It suggests annotating existing Web pages with the concepts of a domain model, and annotating the links between the Web pages with the relations between these concepts. It also suggests explicitly representing high-level hypermedia structures, so-called hypertext composites and contexts, and annotating the corresponding composite and context pages with such structures. RDF and RDF Schema are adopted to represent the domain models and the resulting annotations. The architecture of a prototype annotation tool is outlined and corresponding requirements for automatic annotation support are discussed.
  • Publication
    Exploiting lexical knowledge in learning user profiles for intelligent information access to digital collections
    ( 2005)
    Semeraro, G.
    ;
    Lops, P.
    ;
    Degemmis, M.
    ;
    Niederée, C.
    ;
    Stewart, A.
    Algorithms designed to support users in retrieving relevant information base their relevance computations on user profiles, in which representations of the users interests are maintained. This paper focuses on the use of supervised machine learning techniques to induce user profiles for Intelligent Information Access. The access must be personalized by profiles allowing users to retrieve information on the basis of conceptual content. To address this issue, we propose a method to learn sense-based user profiles based on WordNet, a lexical database.
  • Publication
    An overview on automatic capacity planning
    ( 2005)
    Risse, T.
    The performance requirement for the transformation of messages within electronic business processes is our motivation to investigate in automatic capacity planning methods. Performance typically means the throughput and response time of a system. Finding a configuration of a distributed system satisfying performance goals is a complex search problem that involves many design parameters, like hardware selection, job distribution and process configuration. Performance models are a powerful tool to analyse potential system configurations, however, their evaluation is expensive, such that only a limited number of possible configurations can be evaluated. In this paper we give an overview of our automatic system design method and discuss the arising problems to achieve the performance during the runtime of the systems. Furthermore we make a discussion on the impact of our strategy on the current trends in distributed systems.
  • Publication
    Designing smart artifacts for smart environments
    ( 2005)
    Streitz, N.A.
    ;
    Röcker, C.
    ;
    Prante, T.
    ;
    Alphen, D. van
    ;
    Stenzel, R.
    ;
    Magerkurth, C.
    Smart artifacts promise to enhance the relationships among participants in distributed working groups, maintaining personal mobility while offering opportunities for the collaboration, informal communication, and social awareness that contribute to the synergy and cohesiveness inherent in collocated teams.
  • Publication
    SWQL - A query language for data integration based on OWL
    ( 2005)
    Lehti, P.
    ;
    Fankhauser, P.
    The Web Ontology Language OWL has been advocated as a suitable model for semantic data integration. Data integration requires expressive means to map between heterogeneous OWL schemas. This paper introduces SWQL (Semantic Web Query Language), a strictly typed query language for OWL, and shows how it can be used for mapping between heterogeneous schemas. In contrast to existing RDF query languages which focus on selection and navigation, SWQL also supports construction and user-defined functions to allow for instantiating integrated global schemas in OWL.
  • Publication
    Using context of a mobile user to prefetch relevant information
    ( 2005)
    Kirchner, H.
    Providing mobile users with relevant and up-to-date information on the move through wireless communication needs to take the current context of a user into account. In this paper, the context of a user with respect to his movement behaviour as well as device characteristics is under investigation. In outdoor areas, particularly in an urban area, obviously there is often sufficient communication bandwidth available. In some areas though, especially in rural areas, communication bandwidth coverage is often poor. Providing users in such areas with relevant information and making this information available in time is a major challenge. Prefetching tries to overcome these problems by using predefined user context settings. In situations where resource restrictions like limited bandwidth or insufficient memory apply, strategies come into place to optimize the process. Such strategies will be discussed. Evaluating the different types of users supports the approach of getting the relevant information to the user at the right time and the right place.
  • Publication
    Data communication between the german NBC reconnaissance vehicle and its control center unit
    ( 2005)
    Meissner, A.
    ;
    Schönfeld, W.
    In Germany, the public safety system is largely organized by the German Federal States, which operate, among other equipment, a fleet of Nuclear, Biological and Chemical Reconnaissance Vehicles (NBC RVs) to take measurements in contaminated areas. Currently, NBC RV staff verbally report measured data to a Control Center Unit (CCU) over the assigned Public Safety Organization (PSO) analog voice radio channel. This procedure has several disadvantages. The channel is not secure and its capacity is wasted, which places a limit on the achievable throughput and thus on the number of NBC RVs that can be operational simultaneously, Also, while data is being reported, other PSO members are blocked from sending, and operating personnel is distracted from other work. To overcome these problems, we propose a heterogeneous and flexible communication platforrn that complies with reliability and coverage requirements for PSO. More specifically, our proposed system is designed to replace current ways of communicating between NBC RVs and the CCU. A drastically higher amount of data can then be transmitted to the CCU, and it can be processed in a much more effective manner in the CCU as well as in cooperating PSO units. Ultimately, this will improve NBC RV missions and consequently shorten PSO response time when dealing with NBC disasters.
  • Publication
    From human-computer interaction to human-artefact interaction: Interaction design for smart environments
    ( 2005)
    Streitz, N.A.
    The introduction of computer technology caused a shift away from real objects as sources of information towards desktop computers as the interfaces to information now (re)presented in a digital for-mat. In this paper, I will argue for returning to the real world as the starting point for designing information and communication environments. Our approach is to design environments that exploit the affordances of real world objects and at the same time use the potential of computer-based support. Thus, we move from human-computer interaction to human-artefact interaction. Combining the best of both worlds requires an integration of real and virtual worlds resulting in hybrid worlds. The approach will be demonstrated by sample prototypes we have built as, e.g., the Roomware (R) components and smart artefacts that were developed in the project "Ambient Agoras: Dynamic Information Clouds in a Hybrid World" which was part of the EU-ftinded proactive initiative "The Disappearing Computer"(DC).