Now showing 1 - 8 of 8
  • Publication
    The why and how of trustworthy AI
    Artificial intelligence is increasingly penetrating industrial applications as well as areas that affect our daily lives. As a consequence, there is a need for criteria to validate whether the quality of AI applications is sufficient for their intended use. Both in the academic community and societal debate, an agreement has emerged under the term “trustworthiness” as the set of essential quality requirements that should be placed on an AI application. At the same time, the question of how these quality requirements can be operationalized is to a large extent still open. In this paper, we consider trustworthy AI from two perspectives: the product and organizational perspective. For the former, we present an AI-specific risk analysis and outline how verifiable arguments for the trustworthiness of an AI application can be developed. For the second perspective, we explore how an AI management system can be employed to assure the trustworthiness of an organization with respect to its handling of AI. Finally, we argue that in order to achieve AI trustworthiness, coordinated measures from both product and organizational perspectives are required.
  • Publication
    A generalized Weisfeiler-Lehman graph kernel
    ( 2022-04-27)
    Schulz, Till Hendrik
    ;
    ;
    Welke, Pascal
    ;
    After more than one decade, Weisfeiler-Lehman graph kernels are still among the most prevalent graph kernels due to their remarkable predictive performance and time complexity. They are based on a fast iterative partitioning of vertices, originally designed for deciding graph isomorphism with one-sided error. The Weisfeiler-Lehman graph kernels retain this idea and compare such labels with respect to equality. This binary valued comparison is, however, arguably too rigid for defining suitable graph kernels for certain graph classes. To overcome this limitation, we propose a generalization of Weisfeiler-Lehman graph kernels which takes into account a more natural and finer grade of similarity between Weisfeiler-Lehman labels than equality. We show that the proposed similarity can be calculated efficiently by means of the Wasserstein distance between certain vectors representing Weisfeiler-Lehman labels. This and other facts give rise to the natural choice of partitioning the vertices with the Wasserstein k-means algorithm. We empirically demonstrate on the Weisfeiler-Lehman subtree kernel, which is one of the most prominent Weisfeiler-Lehman graph kernels, that our generalization significantly outperforms this and other state-of-the-art graph kernels in terms of predictive performance on datasets which contain structurally more complex graphs beyond the typically considered molecular graphs.
  • Publication
    Constructing Spaces and Times for Tactical Analysis in Football
    ( 2021)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Anzer, Gabriel
    ;
    Bauer, P.
    ;
    Budziak, G.
    ;
    Fuchs, G.
    ;
    Hecker, D.
    ;
    Weber, H.
    ;
    A possible objective in analyzing trajectories of multiple simultaneously moving objects, such as football players during a game, is to extract and understand the general patterns of coordinated movement in different classes of situations as they develop. For achieving this objective, we propose an approach that includes a combination of query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of data from selected groups of episodes, and a data structure for representing the aggregates that enables their exploration and use in further analysis. The aggregation, which is meant to abstract general movement patterns, involves construction of new time-homomorphic reference systems owing to iterative application of aggregation operators to a sequence of data selections. As similar patterns may occur at different spatial locations, we also propose constructing new spatial reference systems for aligning and matching movements irrespective of their absolute locations. The approach was tested in application to tracking data from two Bundesliga games of the 2018/2019 season. It enabled detection of interesting and meaningful general patterns of team behaviors in three classes of situations defined by football experts. The experts found the approach and the underlying concepts worth implementing in tools for football analysts.
  • Publication
    Effective approximation of parametrized closure systems over transactional data streams
    ( 2020) ;
    Horváth, Tamás
    ;
    Strongly closed itemsets, defined by a parameterized closure operator, are a generalization of ordinary closed itemsets. Depending on the strength of closedness, the family of strongly closed itemsets typically forms a tiny subfamily of ordinary closed itemsets that is stable against changes in the input. In this paper we consider the problem of mining strongly closed itemsets from transactional data streams. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. We motivate the problem setting considered by two practical applications. In particular, we first experimentally demonstrate that the above properties, i.e., compactness and stability, make strongly closed itemsets an excellent indicator of certain types of concept drifts in transactional data streams. As a second application we consider computer-aided product configuration, a real-world problem raised by an industrial project. For this problem, which is essentially exact concept identification, we propose a learning algorithm based on a certain type of subset queries formed by strongly closed itemsets and show on real-world datasets that it requires significantly less query evaluations than a naive algorithm based on membership queries.
  • Publication
    Probabilistic and exact frequent subtree mining in graphs beyond forests
    ( 2019)
    Welke, Pascal
    ;
    ;
    Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.
  • Publication
    Probabilistic frequent subtrees for efficient graph classification and retrieval
    ( 2018)
    Welke, Pascal
    ;
    ;
    Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.
  • Publication
    Spatiotemporal modeling and analysis. Introduction and overview
    ( 2012)
    Körner, Christine
    ;
    ;
    Over the past ve to seven years the analysis of trajectory data has established itself as an independent research discipline within the area of data mining. In this article we provide an overview on data characteristics, state-of-the-art preprocessing and analysis methods of trajectory data. We conclude the article with a collection of challenges that arise due to the increasing variety of spatiotemporal data sources and which have to be solved for the application of spatiotemporal analysis methods in practice.