Now showing 1 - 10 of 119
No Thumbnail Available
Publication

Unsupervised duplicate detection using sample non-duplicates

2006 , Lehti, P. , Fankhauser, P.

The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Typical current methods require deep understanding of the application domain or a good representative training set, which entails significant costs. In this paper we present an unsupervised, domain independent approach to duplicate detection that starts with a broad alignment of potential duplicates, and analyses the distribution of observed similarity values among these potential duplicates and among representative sample non-duplicates to improve the initial alignment. Additionally, the presented approach is not only able to align flat records, but makes also use of related objects, which may significantly increase the alignment accuracy. Evaluations show that our approach supersedes other unsupervised approaches and reaches almost the same accuracy as even fully supervised, domain dependent approaches.

No Thumbnail Available
Publication

Queries in context: Access to digitized historic documents in a collaboratory for the humanities

2005 , Thiel, U. , Brocks, H. , Dirsch-Weigand, A. , Everts, A. , Frommholz, I. , Stein, A.

In contrast to standard digital libraries, systems addressing the specific requirements of cultural heritage need to deal with digitized material like scanned documents instead of home digital items. Such systems aim at providing the means for domain experts, e.g. historians, to collaboratively work with the given material. To support their work, automatic indexing mechanisms for both textual and pictorial digitized documents need to be combined with retrieval methods exploiting the content as well as the context of information items for precise searches. In the COLLATE project we devised several access methods using textual contents, feature extraction from images, metadata, and annotations provided by the users.

No Thumbnail Available
Publication

Advanced technologies for adaptive information managemnent systems

2005 , Risse, T.

No Thumbnail Available
Publication

Enterprise information integration

2005 , Kamps, T. , Stenzel, R. , Chen, L. , Rostek, L.

No Thumbnail Available
Publication

An overview on automatic capacity planning

2005 , Risse, T.

The performance requirement for the transformation of messages within electronic business processes is our motivation to investigate in automatic capacity planning methods. Performance typically means the throughput and response time of a system. Finding a configuration of a distributed system satisfying performance goals is a complex search problem that involves many design parameters, like hardware selection, job distribution and process configuration. Performance models are a powerful tool to analyse potential system configurations, however, their evaluation is expensive, such that only a limited number of possible configurations can be evaluated. In this paper we give an overview of our automatic system design method and discuss the arising problems to achieve the performance during the runtime of the systems. Furthermore we make a discussion on the impact of our strategy on the current trends in distributed systems.

No Thumbnail Available
Publication

Understanding and tailoring your scientific information environment: A context-oriented view on e-science support

2005 , Niederée, C. , Stewart, A. , Muscogiuri, C. , Hemmje, M. , Risse, T.

No Thumbnail Available
Publication

Modelling interactive, three-dimensional information visualizations

2005 , Jäschke, G. , Gupta, P. , Hemmje, M.

Research on information visualization has so far established an outline of the information visualization process and shed light on a broad range of detail aspects involved. However, there is no model in place that describes the nature of information visualization in a coherent, detailed, and well-defined way. We believe that the lack of such a lingua franca hinders communication on and application of information visualization techniques. Our approach is to design a declarative language for describing and defining information visualization techniques. The information visualization modelling language (IVML) provides a means to formally express, note, preserve, and communicate structure, appearance, behaviour, and functionality of information visualization techniques and applications in a standardized way. The anticipated benefits comprise both application and theory.

No Thumbnail Available
Publication

Flexible notifications and task models for cooperative work management

2005 , Rubart, J. , Richter, H.

Knowledge intensive cooperative work requires emergent workflow management. Participants interact with the workflow engine and jointly redefine and activate workflow structure. To improve the usability of such systems we present reconfigurable notification mechanisms as well as shared task models that can be used from diverse clients at the same time focusing on different kinds of visualization and navigation.

No Thumbnail Available
Publication

Cooperation in ubiquitous computing

2005 , Tandler, P. , Dietz, L.

Many ubiquitous computing scenarios deal with cooperative work situations. To successfully support these situations, computer-supported cooperative work (CSCW) concepts and technologies face new challenges. One of the most fundamental concepts for cooperation is sharing. By analyzing applications of sharing in the context of ubiquitous computing it can be shown that ubiquitous computing enables an extended view on sharing. In this paper, we show that this extended view seamlessly integrates the view of "traditional" CSCW and additionally incorporates ubiquitous, heterogeneous, and mobile devices used in a common context.

No Thumbnail Available
Publication

Secure production of digital media

2005 , Steinebach, M.C. , Dittmann, J.

Today more and more media data is produced completely in the digital domain without the need of analogue input. This brings an increase of flexibility and efficiency in media handling, as distributed access, duplication and modification are possible without the need to move or touch physical data carriers. But this also reduces the security of the process: Without physical originals to refer to, changes in the material can remain unnoticed, at the end making the manipulated data the new original. Theft and illegal copies in the digital domain can happen without notice and loss of quality. We therefore see the need of setting up secure media production environments, where access control, integrity and copyright protection as well as traceability of individual copies are enabled. Addressing this need, we design a framework for media production environments, where mechanisms like encryption, digital signatures and digital watermarking help to enable a flexible yet secure handling and processing of the content.