Now showing 1 - 10 of 32
  • Publication
    Towards a decentralized data hub and query system for federated dynamic data spaces
    ( 2023)
    Phuoc, Danh Le
    ;
    ;
    Le-Tuan, Anh
    ;
    Kuehn, Uwe A.
    ;
    This position paper proposes a hybrid architecture for secure and efficient data sharing and processing across dynamic data spaces. On the one hand, current centralized approaches are plagued by issues such as lack of privacy and control for users, high costs, and bad performance, making these approaches unsuitable for the decentralized data spaces prevalent in Europe and various industries (decentralized on the conceptual and physical levels while centralized in the underlying implementation). On the other hand, decentralized systems face challenges with limited knowledge of/control over the global system, fair resource utilization, and data provenance. Our proposed Semantic Data Ledger (SDL) approach combines the advantages of both architectures to overcome their limitations. SDL allows users to choose the best combination of centralized and decentralized features, providing a decentralized infrastructure for the publication of structured data with machine-readable semantics. It supports expressive structured queries, secure data sharing, and payment mechanisms based on an underlying autonomous ledger, enabling the implementation of economic models and fair-use strategies.
  • Publication
    A decentralised persistent identification layer for DCAT datasets
    The Data Catalogue Vocabulary (DCAT) standard is a popular RDF vocabulary for publishing metadata about data catalogs and a valuable foundation for creating Knowledge Graphs. It has widespread application in the (Linked) Open Data and scientific communities. However, DCAT does not specify a robust mechanism to create and maintain persistent identifiers for the datasets. It relies on Internationalized Resource Identifiers (IRIs), that are not necessarily unique, resolvable and persistent. This impedes findability, citation abilities, and traceability of derived and aggregated data artifacts. As a remedy, we propose a decentralized identifier registry where persistent identifiers are managed by a set of collaborative distributed nodes. Every node gives full access to all identifiers, since an unambiguous state is shared across all nodes. This facilitates a common view on the identifiers without the need for a (virtually) centralized directory. To support this architecture, we propose a data model and network methodology based on a distributed ledger and the W3C recommendation for Decentralized Identifiers (DID). We implemented our approach as a working prototype on a five-peer test network based on Hyperledger Fabric.
  • Publication
    Uncomputation in the Qrisp High-Level Quantum Programming Framework
    ( 2023)
    Seidel, Raphael
    ;
    Tcholtchev, Nikolay Vassilev
    ;
    ;
    Uncomputation is an essential part of reversible computing and plays a vital role in quantum computing. Using this technique, memory resources can be safely deallocated without performing a non-reversible deletion process. For the case of quantum computing, several algorithms depend on this as they require disentangled states in the course of their execution. Thus, uncomputation is not only about resource management, but is also required from an algorithmic point of view. However, synthesizing uncomputation circuits is tedious and can be automated. In this paper, we describe the interface for automated generation of uncomputation circuits in our Qrisp framework. Our algorithm for synthesizing uncomputation circuits in Qrisp is based on an improved version of “Unqomp”, a solution presented by Paradis et al. Our paper also presents some improvements to the original algorithm, in order to make it suitable for the needs of a high-level programming framework. Qrisp itself is a fully compilable, high-level programming language/framework for gate-based quantum computers, which abstracts from many of the underlying hardware details. Qrisp’s goal is to support a high-level programming paradigm as known from classical software development.
  • Publication
    Computer Scientist's and Programmer's View on Quantum Algorithms: Mapping Functions' APIs and Inputs to Oracles
    ( 2022) ;
    Tcholtchev, Nikolay Vassilev
    ;
    ;
    Quantum Computing (QC) is a promising approach which is expected to boost the development of new services and applications. Specific addressable problems can be tackled through acceleration in computational time and advances with respect to the complexity of the problems, for which QC algorithms can support the solution search. However, QC currently remains a domain that is strongly dominated by a physics' perspective. Indeed, in order to bring QC to industrial grade applications we need to consider multiple perspectives, especially the one of software engineering and software application/service programming. Following this line of thought, the current paper presents our computer scientist's view on the aspect of black-box oracles, which are a key construct for the majority of currently available QC algorithms. Thereby, we observe the need for the input of API functions from the traditional world of software engineering and (web-)services to be mapped to the above mentioned black-box oracles. Hence, there is a clear requirement for automatically generating oracles for specific types of problems/algorithms based on the concrete input to the belonging APIs. In this paper, we discuss the above aspects and illustrate them on two QC algorithms, namely Deutsch-Jozsa and the Grover's algorithm.
  • Publication
    Towards building live open scientific knowledge graphs
    ( 2022)
    Le-Tuan, Anh
    ;
    Franzreb, Carlos
    ;
    Phuoc, Danh Le
    ;
    ;
    Due to the large number and heterogeneity of data sources, it becomes increasingly difficult to follow the research output and the scientific discourse. For example, a publication listed on DBLP may be discussed on Twitter and its underlying data set may be used in a different paper published on arXiv. The scientific discourse this publication is involved in is divided among not integrated systems, and for researchers it might be very hard to follow all discourses a publication or data set may be involved in. Also, many of these data sources-DBLP, arXiv, or Twitter, to name a few-are often updated in real-time. These systems are not integrated (silos), and there is no system for users to query the content/data actively or, what would be even more beneficial, in a publish/subscribe fashion, i.e., a system would actively notify researchers of work interesting to them when such work or discussions become available. In this position paper, we introduce our concept of a live open knowledge graph which can integrate an extensible set of existing or new data sources in a streaming fashion, continuously fetching data from these heterogeneous sources, and interlinking and enriching it on-the-fly. Users can subscribe to continuously query the content/data of their interest and get notified when new content/data becomes available. We also highlight open challenges in realizing a system enabling this concept at scale.
  • Publication
    VisionKG: Towards a unified vision knowledge graph
    ( 2021)
    Le-Tuan, Anh
    ;
    Tran, Trung-Kien
    ;
    Nguyen-Duc, Manh
    ;
    Yuan, Jicheng
    ;
    ;
    Phuoc, Danh Le
    Computer Vision (CV) has recently achieved signi_cant im-provements, thanks to the evolution of deep learning. Along with ad-vanced architectures and optimisations of deep neural networks, CV data for (cross-datasets) training, validating, and testing contributes greatly to the performance of CV models. Many CV datasets have been created for different tasks, but they are available in heterogeneous data formats and semantic representations. Therefore, it is challenging when one needs to combine different datasets either for training or testing purposes. This paper proposes a unified framework using the Semantic Web technology that provides a novel way to interlink and integrate labelled data across different data sources. We demonstrate its advantages via various sce-narios with the system framework accessible both online and via APIs.4.
  • Publication
    Beyond the Hype: Why Do Data-Driven Projects Fail?
    ( 2021) ;
    Blume, Julia
    ;
    Fabian, Benjamin
    ;
    Fomenko, Elena
    ;
    Berlin, Marcus
    ;
    Despite substantial investments, data science has failed to deliver significant business value in many companies. So far, the reasons for this problem have not been explored systematically. This study tries to find possible explanations for this shortcoming and analyses the specific challenges in data-driven projects. To identify the reasons that make data-driven projects fall short of expectations, multiple rounds of qualitative semi-structured interviews with domain experts with different roles in data-driven projects were carried out. This was followed by a questionnaire surveying 112 experts with experience in data projects from eleven industries. Our results show that the main reasons for failure in data-driven projects are (1) the lack of understanding of the business context and user needs, (2) low data quality, and (3) data access problems. It is interesting, that 54% of respondents see a conceptual gap between business strategies and the implementation of analytics solutions. Based on our results, we give recommendations for how to overcome this conceptual distance and carrying out data-driven projects more successfully in the future.
  • Publication
    V2X Attack Vectors and Risk Analysis for Automated Cooperative Driving
    ( 2021)
    Sawade, Oliver
    ;
    ;
    Cooperative systems have finally entered the automotive mass market with the introduction of the 2020 VW Golf. These "Day-1" functions are aimed at driver information and warning only, but the integration of cooperative systems and automated driver assistance is already planned on several levels such as cooperative perception and cooperative driving maneuvers. The introduction of wireless open networks into highly critical systems sets highest demands to safety and security. In this paper we examine several cybersecurity attack vectors on Day-1 and future cooperative systems by applying the methodology used in functional safety. We evaluate attack difficulty (exposure), severity and controllability for a selection of current and next-gen functions. From this analysis, we derive associated risks and thus give recommendations to researchers and engineers. Finally, we simulate a selection of attacks on a platoon and evaluate function behavior and the possibility of critical system malfunction.
  • Publication
    A provenance meta learning framework for missing data handling methods selection
    ( 2020)
    Liu, Qian
    ;
    Missing data is a big problem in many real-world data sets and applications, which can lead to wrong or misleading results of analyses and lower quality and confidence in the results. A large number of missing data handling methods have been proposed in the research community but there exists no universally single best method which can handle all the missing data problems. To select the right method for a specific missing data handling problem, it usually depends on multiple inter-twined factors. To alleviate this methods selection problem, in this paper, we propose a Provenance Meta Learning Framework to simplify this process. We conducted an extensive literature review over 118 missing data handling method survey papers from 2000 to 2019. With this review, we analyse 9 influential factors and 12 selection criteria for missing data handling methods and further perform a detailed analysis of 6 popular missing data handling methods (4 machine learning methods, i.e., KNN Imputation (KNNI), Weighted KNN Imputation (WKNNI), K Means Imputation (KMI), and Fuzzy KMI (FKMI), and 2 ad-hoc methods, i.e., Median/Mode Imputation (MMI) and Group/Class MMI (CMMI)). We focus on missing data handling methods selection for 3 different classification techniques, i.e., C4.5, KNN, and RIPPER. In our evaluations, we adopt 25 real world data sets from KEEL and UCI data sets repositories. Our Provenance Meta Learning Framework suggests that using KNNI to handle missing values when missing data mechanism is Missing Complete At Random (MCAR), missing data pattern is uni-attribute missing data pattern, or monotone missing data pattern, missing data rate is within [1%,5%], number of class labels is 2, sample size is no more than 10'000, since it can keep classification performance better and have higher imputation accuracy and imputation exhaustiveness than all the other 5 missing data handling methods when subsequent classification methods are KNN or RIPPER.
  • Publication
    Quantum DevOps: Towards reliable and applicable NISQ Quantum Computing
    Quantum Computing is emerging as one of the great hopes for boosting current computational resources and enabling the application of ICT for optimizing processes and solving complex and challenging domain specific problems. However, the Quantum Computing technology has not matured to a level where it can provide a clear advantage over high performance computing yet. Towards achieving this "quantum advantage", a larger number of Qubits is required, leading inevitably to a more complex topology of the computing Qubits. This raises additional difficulties with decoherence times and implies higher Qubit error rates. Nevertheless, the current Noisy Intermediate-Scale Quantum (NISQ) computers can prove useful despite the intrinsic uncertainties on the quantum hardware layer. In order to utilize such error-prone computing resources, various concepts are required to address Qubit errors and to deliver successful computations. In this paper describe and motivate the need for the novel concept of Quantum DevOps. which entails regular checking of the reliability of NISQ Quantum Computing (QC) instances. By means of testing the computational reliability of basic quantum gates and computations (C-NOT, Hadamard, etc.)it consequently estimates the likelihood for a large scale critical computation (e.g. calculating hourly traffic flow models for a city) to provide results of sufficient quality. Following this approach to select the best matching (cloud) QC instance and having it integrated directly with the processes of development, testing and finally the operations of quantum based algorithms and systems enables the Quantum DevOps concept.