Now showing 1 - 8 of 8
  • Publication
    The why and how of trustworthy AI
    Artificial intelligence is increasingly penetrating industrial applications as well as areas that affect our daily lives. As a consequence, there is a need for criteria to validate whether the quality of AI applications is sufficient for their intended use. Both in the academic community and societal debate, an agreement has emerged under the term “trustworthiness” as the set of essential quality requirements that should be placed on an AI application. At the same time, the question of how these quality requirements can be operationalized is to a large extent still open. In this paper, we consider trustworthy AI from two perspectives: the product and organizational perspective. For the former, we present an AI-specific risk analysis and outline how verifiable arguments for the trustworthiness of an AI application can be developed. For the second perspective, we explore how an AI management system can be employed to assure the trustworthiness of an organization with respect to its handling of AI. Finally, we argue that in order to achieve AI trustworthiness, coordinated measures from both product and organizational perspectives are required.
  • Publication
    Probabilistic and exact frequent subtree mining in graphs beyond forests
    ( 2019)
    Welke, Pascal
    ;
    ;
    Motivated by the impressive predictive power of simple patterns, we consider the problem of mining frequent subtrees in arbitrary graphs. Although the restriction of the pattern language to trees does not resolve the computational complexity of frequent subgraph mining, in a recent work we have shown that it gives rise to an algorithm generating probabilistic frequent subtrees, a random subset of all frequent subtrees, from arbitrary graphs with polynomial delay. It is based on replacing each transaction graph in the input database with a forest formed by a random subset of its spanning trees. This simple technique turned out to be quite powerful on molecule classification tasks. It has, however, the drawback that the number of sampled spanning trees must be bounded by a polynomial of the size of the transaction graphs, resulting in less impressive recall even for slightly more complex structures beyond molecular graphs. To overcome this limitation, in this work we propose an algorithm mining probabilistic frequent subtrees also with polynomial delay, but by replacing each graph with a forest formed by an exponentially large implicit subset of its spanning trees. We demonstrate the superiority of our algorithm over the simple one on threshold graphs used e.g. in spectral clustering. In addition, providing sufficient conditions for the completeness and efficiency of our algorithm, we obtain a positive complexity result on exact frequent subtree mining for a novel, practically and theoretically relevant graph class that is orthogonal to all graph classes defined by some constant bound on monotone graph properties.
  • Publication
    Probabilistic frequent subtrees for efficient graph classification and retrieval
    ( 2018)
    Welke, Pascal
    ;
    ;
    Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.
  • Publication
    Spatiotemporal modeling and analysis. Introduction and overview
    Over the past ve to seven years the analysis of trajectory data has established itself as an independent research discipline within the area of data mining. In this article we provide an overview on data characteristics, state-of-the-art preprocessing and analysis methods of trajectory data. We conclude the article with a collection of challenges that arise due to the increasing variety of spatiotemporal data sources and which have to be solved for the application of spatiotemporal analysis methods in practice.
  • Publication
    Challenging problems of geospatial visual analytics
    ( 2011)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Keim, Daniel A.
    ;
    MacEachren, Alan M.
    ;
  • Publication
    A conceptual framework and taxonomy of techniques for analyzing movement
    ( 2011)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Bak, Peter
    ;
    Keim, Daniel A.
    ;
    Kisilevich, S.
    ;
    Movement data link together space, time, and objects positioned in space and time. They hold valuable and multifaceted information about moving objects, properties of space and time as well as events and processes occurring in space and time. We present a conceptual framework that describes in a systematic and comprehensive way the possible types of information that can be extracted from movement data and on this basis defines the respective types of analytical tasks. Tasks are distinguished according to the type of information they target and according to the level of analysis, which may be elementary (i.e. addressing specific elements of a set) or synoptic (i.e. addressing a set or subsets). We also present a taxonomy of generic analytic techniques, in which the types of tasks are linked to the corresponding classes of techniques that can support fulfilling them. We include techniques from several research fields: visualization and visual analytics, geographic informa tion science, database technology, and data mining. We expect the taxonomy to be valuable for analysts and researchers. Analysts will receive guidance in choosing suitable analytic techniques for their data and tasks. Researchers will learn what approaches exist in different fields and compare or relate them to the approaches they are going to undertake.
  • Publication
    Listing closed sets of strongly accessible set systems with applications to data mining
    We study the problem of listing all closed sets of a closure operator a that is a partial function on the power set of some finite ground set E, i.e., sigma : F -> F with F subset of P(E). A very simple divide-and-conquer algorithm is analyzed that correctly solves this problem if and only if the domain of the closure operator is a strongly accessible set system. Strong accessibility is a strict relaxation of greedoids as well as of independence systems. This algorithm turns out to have delay O (vertical bar E vertical bar (T-F + T-sigma + vertical bar E vertical bar)) and space O (vertical bar E vertical bar + S-F + S-sigma), where T-F, S-F, T-sigma, and S-sigma are the time and space complexities of checking membership in F and computing a, respectively. In contrast, we show that the problem becomes intractable for accessible set systems. We relate our results to the data mining problem of listing all support-closed patterns of a dataset and show that there is a corresponding closure operator for all datasets if and only if the set system satisfies a certain confluence property.
  • Publication
    Visual analytics tools for analysis of movement data
    ( 2007)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    With widespread availability of low cost GPS devices, it is becoming possible to record data about the movement of people and objects at a large scale. While these data hide important knowledge for the optimization of location and mobility oriented infrastructures and services, by themselves they lack the necessary semantic embedding which would make fully automatic algorithmic analysis possible. At the same time, making the semantic link is easy for humans who however cannot deal well with massive amounts of data. In this paper, we argue that by using the right visual analytics tools for the analysis of massive collections of movement data, it is possible to effectively support human analysts in understanding movement behaviors and mobility patterns. We suggest a framework for analysis combining interactive visual displays, which are essential for supporting human perception, cognition, and reasoning, with database operations and computational methods, which are necessary for handling large amounts of data. We demonstrate the synergistic use of these techniques in case studies of two real datasets.