Now showing 1 - 10 of 23
  • Publication
    A study on interoperability between two Personal Health Train infrastructures in leukodystrophy data analysis
    ( 2024)
    Welten, Sascha
    ;
    Arruda Botelho Herr, Marius De
    ;
    Hempel, Lars
    ;
    Hieber, David
    ;
    Placzek, Peter
    ;
    Graf, Michael
    ;
    Weber, Sven
    ;
    Neumann, Laurenz
    ;
    Jugl, Maximilian
    ;
    Tirpitz, Liam
    ;
    Kindermann, Karl
    ;
    ;
    Bonino da Silva Santos, Luiz Olavo
    ;
    ;
    Pfeifer, Nico
    ;
    Kohlbacher, Oliver
    ;
    Kirsten, Toralf
    The development of platforms for distributed analytics has been driven by a growing need to comply with various governance-related or legal constraints. Among these platforms, the so-called Personal Health Train (PHT) is one representative that has emerged over the recent years. However, in projects that require data from sites featuring different PHT infrastructures, institutions are facing challenges emerging from the combination of multiple PHT ecosystems, including data governance, regulatory compliance, or the modification of existing workflows. In these scenarios, the interoperability of the platforms is preferable. In this work, we introduce a conceptual framework for the technical interoperability of the PHT covering five essential requirements: Data integration, unified station identifiers, mutual metadata, aligned security protocols, and business logic. We evaluated our concept in a feasibility study that involves two distinct PHT infrastructures: PHT-meDIC and PADME. We analyzed data on leukodystrophy from patients in the University Hospitals of Tübingen and Leipzig, and patients with differential diagnoses at the University Hospital Aachen. The results of our study demonstrate the technical interoperability between these two PHT infrastructures, allowing researchers to perform analyses across the participating institutions. Our method is more space-efficient compared to the multi-homing strategy, and it shows only a minimal time overhead.
  • Publication
    A Knowledge Graph for Query-Induced Analyses of Hierarchically Structured Time Series Information
    This paper introduces the concept of a knowledge graph for time series data, which allows for a structured management and propagation of characteristic time series information and the ability to support query-driven data analyses. We gradually link and enrich knowledge obtained by domain experts or previously performed analyses by representing globally and locally occurring time series insights as individual graph nodes. Supported by a utilization of techniques from automated knowledge discovery and machine learning, a recursive integration of analytical query results is exploited to generate a spectral representation of linked and successively condensed information. Besides a time series to graph mapping, we provide an ontology describing a classification of maintained knowledge and affiliated analysis methods for knowledge generation. After a discussion on gradual knowledge enrichment, we finally illustrate the concept of knowledge propagation based on an application of state-of-the-art methods for time series analysis.
  • Publication
    pFedV: Mitigating Feature Distribution Skewness via Personalized Federated Learning with Variational Distribution Constraints
    ( 2023)
    Mou, Yongli
    ;
    Geng, Jiahui
    ;
    Zhou, Feng
    ;
    Beyan, Oya Deniz
    ;
    Rong, Chunming
    ;
    Statistical heterogeneity, especially feature distribution skewness, among the distributed data is a common phenomenon in practice, which is a challenging problem in federated learning that can lead to a degradation in the performance of the aggregated global model. In this paper, we introduce pFedV, a novel approach that leverages a variational inference perspective by incorporating a variational distribution into neural networks. During training, we add the KL-divergence term to the loss function to constrain the output distribution of layers for feature extraction and personalize the final layer of models. The experimental results demonstrate the effectiveness of our approaches in mitigating the distribution shift in feature space in federated learning.
  • Publication
    Semantics in Dataspaces: Origin and Future Directions
    ( 2023) ;
    Kocher, Max
    ;
    ; ;
    Paulus, Alexander
    ;
    Pomp, André
    ;
    Curry, Edward
    The term dataspace was coined two decades ago and has evolved since then. Definitions range from (i) an abstraction for data management in an identifiable scope over (iii) a multi-sided data platform connecting participants in an ecosystem to (iii) interlinking data towards loosely connected (global) information. Many implementations and scientific notions follow different interpretations of the term dataspace, but agree on some use of semantic technologies. For example, dataspaces such as the European Open Science Cloud and the German National Research Data Infrastructure are committed to applying the FAIR principles. Dataspaces built on top of Gaia-X are using semantic methods for service Self-Descriptions. This paper investigates ongoing dataspace efforts and aims to provide insights on the definition of the term dataspace, the usage of semantics and FAIR principles, and future directions for the role of semantics in dataspaces.
  • Publication
    Explainable AI for Bioinformatics: Methods, Tools and Applications
    ( 2023)
    Karim, Md. Rezaul
    ;
    Islam, Tanhim
    ;
    Shajalal, Md
    ;
    Beyan, Oya
    ;
    Lange, Christoph
    ;
    Cochez, Michael
    ;
    Rebholz-Schuhmann, Dietrich
    ;
    Artificial intelligence (AI) systems utilizing deep neural networks and machine learning (ML) algorithms are widely used for solving critical problems in bioinformatics, biomedical informatics and precision medicine. However, complex ML models that are often perceived as opaque and black-box methods make it difficult to understand the reasoning behind their decisions. This lack of transparency can be a challenge for both end-users and decision-makers, as well as AI developers. In sensitive areas such as healthcare, explainability and accountability are not only desirable properties but also legally required for AI systems that can have a significant impact on human lives. Fairness is another growing concern, as algorithmic decisions should not show bias or discrimination towards certain groups or individuals based on sensitive attributes. Explainable AI (XAI) aims to overcome the opaqueness of black-box models and to provide transparency in how AI systems make decisions. Interpretable ML models can explain how they make predictions and identify factors that influence their outcomes. However, the majority of the state-of-the-art interpretable ML methods are domain-agnostic and have evolved from fields such as computer vision, automated reasoning or statistics, making direct application to bioinformatics problems challenging without customization and domain adaptation. In this paper, we discuss the importance of explainability and algorithmic transparency in the context of bioinformatics. We provide an overview of model-specific and model-agnostic interpretable ML methods and tools and outline their potential limitations. We discuss how existing interpretable ML methods can be customized and fit to bioinformatics research problems. Further, through case studies in bioimaging, cancer genomics and text mining, we demonstrate how XAI methods can improve transparency and decision fairness. Our review aims at providing valuable insights and serving as a starting point for researchers wanting to enhance explainability and decision transparency while solving bioinformatics problems. GitHub: https://github.com/rezacsedu/XAI-for-bioinformatics.
  • Publication
    What prevents us from reusing medical real-world data in research
    ( 2023)
    Gehrmann, Julia
    ;
    Herczog, Edit
    ;
    ;
    Beyan, Oya Deniz
  • Publication
    Will it run?—A proof of concept for smoke testing decentralized data analytics experiments
    ( 2023)
    Welten, Sascha
    ;
    Weber, Sven
    ;
    Holt, Adrian
    ;
    Beyan, Oya Deniz
    ;
    The growing interest in data-driven medicine, in conjunction with the formation of initiatives such as the European Health Data Space (EHDS) has demonstrated the need for methodologies that are capable of facilitating privacy-preserving data analysis. Distributed Analytics (DA) as an enabler for privacy-preserving analysis across multiple data sources has shown its potential to support data-intensive research. However, the application of DA creates new challenges stemming from its distributed nature, such as identifying single points of failure (SPOFs) in DA tasks before their actual execution. Failing to detect such SPOFs can, for example, result in improper termination of the DA code, necessitating additional efforts from multiple stakeholders to resolve the malfunctions. Moreover, these malfunctions disrupt the seamless conduct of DA and entail several crucial consequences, including technical obstacles to resolve the issues, potential delays in research outcomes, and increased costs. In this study, we address this challenge by introducing a concept based on a method called Smoke Testing, an initial and foundational test run to ensure the operability of the analysis code. We review existing DA platforms and systematically extract six specific Smoke Testing criteria for DA applications. With these criteria in mind, we create an interactive environment called Development Environment for AuTomated and Holistic Smoke Testing of Analysis-Runs (DEATHSTAR), which allows researchers to perform Smoke Tests on their DA experiments. We conduct a user-study with 29 participants to assess our environment and additionally apply it to three real use cases. The results of our evaluation validate its effectiveness, revealing that 96.6% of the analyses created and (Smoke) tested by participants using our approach successfully terminated without any errors. Thus, by incorporating Smoke Testing as a fundamental method, our approach helps identify potential malfunctions early in the development process, ensuring smoother data-driven research within the scope of DA. Through its flexibility and adaptability to diverse real use cases, our solution enables more robust and efficient development of DA experiments, which contributes to their reliability.
  • Publication
    Integrating an XPath-Enhanced OPC UA Data Collection Into Industrial Communication
    ( 2022-09-06) ;
    Kocher, Max
    ;
    Rath, Michael
    ;
    Ulrich, Sebastian
    ;
    Rudack, Maximilian
    ;
    Recent trends lead to more and more available data from automated manufacturing systems. Many data scientists collect data from production machines via the fast and widespread OPC Unified Architecture (OPC UA) communication protocol. However, heterogeneous vendor-specific configurations require high manual effort for establishing new data collections, and therefore valuable metadata is either extracted hard-coded or not at all. Our previous publication tackled this challenge purely from a computer science perspective and proposed a transformation of the query language XPath to OPC UA to enable more convenient and expressive queries. Remaining open research questions include a more sophisticated data collection from multiple automated manufacturing systems and proper integration into enterprise-scaled industrial communication systems.This paper focuses the manufacturing perspective and covers these research questions with a real-world implementation for a 500 tons High Pressure Die Casting (HPDC) production cell with six embedded OPC UA servers. We formulate XPath queries and apply these to augment the current workflow, which only extracts raw sensor values, to a more comprehensive one that additionally captures metadata such as units, value ranges, and measurement precision. Our main contributions include an aggregation of multiple individual data collection setups into a general one and an embedding of these into a fully-fledged data lake integration. The results demonstrate an integration of an XPath-enhanced OPC UA data collection into industrial communication for automated manufacturing systems, which dramatically reduces complexity and manual effort for experts.
  • Publication
    SASP: a Semantic web-based Approach for management of Sharable cybersecurity Playbooks
    In incident management, response and recovery actions are designed to effectively mitigate ongoing or future cyberattacks. A security playbook consists of a pipeline of instructions to document necessary response and recovery actions to deal with a specific type of incident. Since many organisations lack the resources, expertise and know-how to handle incidents, sharing playbooks across organisations could significantly improve their response capabilities against cyberattacks. However, playbooks are often organisation specific and usually not machine-readable, sharable and interoperable. In this work, we propose a semantic web-based approach to capture the knowledge of incident response and recovery steps to support sharing of playbooks based on a standardised and common vocabulary. To further demonstrate our approach, we introduce SASP, a proof-of-concept tool based on Semantic MediaWiki for playbook management. In this paper, we describe the key requirements from incident handlers to share playbooks, SASP architecture design, and its core components and functionalities. We then discuss the results of our user-centric evaluation conducted on members of different Security Operation Centres and the further potential of the solution.