Publications Search Results

Now showing 1 - 4 of 4
  • Publication
    RDF data storage and query processing schemes
    ( 2018)
    Wylot, Marcin
    ;
    ;
    Cudré-Mauroux, Philippe
    ;
    Sakr, Sharif
    The Resource Description Framework (RDF) represents a main ingredient and data representation format for Linked Data and the Semantic Web. It supports a generic graph-based data model and data representation format for describing things, including their relationships with other things. As the size of RDF datasets is growing fast, RDF data management systems must be able to cope with growing amounts of data. Even though physically handling RDF data using a relational table is possible, querying a giant triple table becomes very expensive because of the multiple nested joins required for answering graph queries. In addition, the heterogeneity of RDF Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in handling and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. Moreover, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.
  • Publication
    Storing, tracking, and querying provenance linked data
    ( 2017)
    Wylot, Marcin
    ;
    Cudré-Mauroux, Philippe
    ;
    ;
    Groth, Paul
    The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triplestores. We present methods extending a native RDF store to efficiently handle the storage, tracking, and querying of provenance in RDF data. We describe a reliable and understandable specification of the way results were derived from the data and how particular pieces of data were combined to answer a query. Subsequently, we present techniques to tailor queries with provenance data. We empirically evaluate the presented methods and show that the overhead of storing and tracking provenance is acceptable. Finally, we show that tailoring a query with provenance information can also significantly improve the performance of query execution.
  • Publication
    Non-native RDF storage engines
    ( 2017) ;
    Wylot, Marcin
    ;
    Grund, Martin
    ;
    Sakr, Sharif
    ;
    Cudré-Mauroux, Philippe
    The proliferation of heterogeneous Linked Data requires data management systems to constantly improve their scalability and efficiency. Linked Data can be stored according to many different data storage models. Some of these attempt to use general purpose database storage techniques to persist Linked Data, hence they can leverage existing data processing environments (e.g., big Hadoop clusters). We therefore look at the multiplicity of Linked Data storage systems which we categorize into the following classes: relational database-based systems, NoSQL-based systems, massively parallel systems.
  • Publication
    Semantic gossiping. Fostering semantic interoperability in peer data management systems
    ( 2006)
    Aberer, Karl
    ;
    Cudré-Mauroux, Philippe
    ;
    Until recently, most data integration techniques revolved around central approaches, e.g., global schemas, to enable transparent access to heterogeneous databases. However, with the advent of the Internet and the democratization of tools facilitating knowledge elicitation in machine-processable formats, the situation is quickly evolving. One cannot rely on global, centralized schemas anymore as knowledge creation and consumption are getting increasingly dynamic and decentralized. Peer Data Management Systems (PDMS) address this problem by eliminating centralization and instead applying compositions of local, pair-wise mappings to propagate queries among databases. We present a method to foster global semantic interoperability in PDMS settings in a totally decentralized way based on the analysis of the semantic graph linking data sources with pairwise semantic mappings. We describe how quality measures for the mappings can automatically be derived by analyzing transitive closures of mapping operations. The information obtained from these analyses are then used by the peers to route queries in a network of semantically heterogeneous sources, and to iteratively correct erroneous mappings in a self-organizing way. Additionally, we present heuristics to analyze semantic interoperability in large and heterogeneous communities. Finally, we describe Grid- Vine which implements our approach and provides a semantic overlay to demonstrate how our approach can be deployed in a practical setting.