Options
Prof. Dr.
Wrobel, Stefan
Now showing
1  10 of 26

PublicationData Ecosystems: A New Dimension of Value Creation Using AI and Machine Learning( 20220722)Machine learning and artificial intelligence have become crucial factors for the competitiveness of individual companies and entire economies. Yet their successful deployment requires access to a large volume of training data often not even available to the largest corporations. The rise of trustworthy federated digital ecosystems will significantly improve data availability for all participants and thus will allow a quantum leap for the widespread adoption of artificial intelligence at all scales of companies and in all sectors of the economy. In this chapter, we will explain how AI systems are built with data science and machine learning principles and describe how this leads to AI platforms. We will detail the principles of distributed learning which represents a perfect match with the principles of distributed data ecosystems and discuss how trust, as a central value proposition of modern ecosystems, carries over to creating trustworthy AI systems.

PublicationA generalized WeisfeilerLehman graph kernel( 20220427)
;Schulz, Till Hendrik ;Welke, PascalAfter more than one decade, WeisfeilerLehman graph kernels are still among the most prevalent graph kernels due to their remarkable predictive performance and time complexity. They are based on a fast iterative partitioning of vertices, originally designed for deciding graph isomorphism with onesided error. The WeisfeilerLehman graph kernels retain this idea and compare such labels with respect to equality. This binary valued comparison is, however, arguably too rigid for defining suitable graph kernels for certain graph classes. To overcome this limitation, we propose a generalization of WeisfeilerLehman graph kernels which takes into account a more natural and finer grade of similarity between WeisfeilerLehman labels than equality. We show that the proposed similarity can be calculated efficiently by means of the Wasserstein distance between certain vectors representing WeisfeilerLehman labels. This and other facts give rise to the natural choice of partitioning the vertices with the Wasserstein kmeans algorithm. We empirically demonstrate on the WeisfeilerLehman subtree kernel, which is one of the most prominent WeisfeilerLehman graph kernels, that our generalization significantly outperforms this and other stateoftheart graph kernels in terms of predictive performance on datasets which contain structurally more complex graphs beyond the typically considered molecular graphs. 
PublicationDecision Snippet Features( 2021)
;Welke, Pascal ;Alkhoury, FouadDecision trees excel at interpretability of their prediction results. To achieve required prediction accuracies, however, often large ensembles of decision trees random forests are considered, reducing interpretability due to large size. Additionally, their size slows down inference on modern hardware and restricts their applicability in lowmemory embedded devices. We introduce Decision Snippet Features, which are obtained from small subtrees that appear frequently in trained random forests. We subsequently show that linear models on top of these features achieve comparable and sometimes even better predictive performance than the original random forest, while reducing the model size by up to two orders of magnitude. 
PublicationMaximum Margin Separations in Finite Closure Systems( 2021)
;Seiffahrt, Florian ;HorvÃ¡rth, TamÃ¡sMonotone linkage functions provide a measure for proximities between elements and subsets of a ground set. Combining this notion with Vapniks idea of support vector machines, we extend the concepts of maximal closed set and halfspace separation in finite closure systems to those with maximum margin. In particular, we define the notion of margin for finite closure systems by means of monotone linkage functions and give a greedy algorithm computing a maximum margin closed set separation for two sets efficiently. The output closed sets are maximum margin halfspaces, i.e., form a partitioning of the ground set if the closure system is Kakutani. We have empirically evaluated our approach on different synthetic datasets. In addition to binary classification of finite subsets of the Euclidean space, we considered also the problem of vertex classification in graphs. Our experimental results provide clear evidence that maximal closed set separation with maximum margin results in a much better predictive performance than that with arbitrary maximal closed sets. 
PublicationConstructing Spaces and Times for Tactical Analysis in Football( 2021)
;Andrienko, Gennady ;Andrienko, Natalia ;Anzer, Gabriel ;Bauer, P. ;Budziak, G. ;Fuchs, G. ;Hecker, D. ;Weber, H.A possible objective in analyzing trajectories of multiple simultaneously moving objects, such as football players during a game, is to extract and understand the general patterns of coordinated movement in different classes of situations as they develop. For achieving this objective, we propose an approach that includes a combination of query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of data from selected groups of episodes, and a data structure for representing the aggregates that enables their exploration and use in further analysis. The aggregation, which is meant to abstract general movement patterns, involves construction of new timehomomorphic reference systems owing to iterative application of aggregation operators to a sequence of data selections. As similar patterns may occur at different spatial locations, we also propose constructing new spatial reference systems for aligning and matching movements irrespective of their absolute locations. The approach was tested in application to tracking data from two Bundesliga games of the 2018/2019 season. It enabled detection of interesting and meaningful general patterns of team behaviors in three classes of situations defined by football experts. The experts found the approach and the underlying concepts worth implementing in tools for football analysts. 
PublicationVisual Analytics for Data Scientists(Springer Nature, 2020)
;Andrienko, Natalia ;Andrienko, Gennady ;Slingsby, Aidan ;Turkay, Cagatay 
PublicationHOPS: Probabilistic Subtree Mining for Small and Large Graphs( 2020)
;Welke, Pascal ;Seiffarth, Florian ;Kamp, MichaelFrequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a wellknown data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domainspecific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to mediumsize graphs or a single large graph. By restricting on tree patterns, we provide an algorithm tha t approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sublinear time with onesided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms stateoftheart approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs. 
PublicationEffective approximation of parametrized closure systems over transactional data streams( 2020)
;HorvÃ¡th, TamÃ¡sStrongly closed itemsets, defined by a parameterized closure operator, are a generalization of ordinary closed itemsets. Depending on the strength of closedness, the family of strongly closed itemsets typically forms a tiny subfamily of ordinary closed itemsets that is stable against changes in the input. In this paper we consider the problem of mining strongly closed itemsets from transactional data streams. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speedup over a straightforward naive algorithm without any significant loss in precision and recall. We motivate the problem setting considered by two practical applications. In particular, we first experimentally demonstrate that the above properties, i.e., compactness and stability, make strongly closed itemsets an excellent indicator of certain types of concept drifts in transactional data streams. As a second application we consider computeraided product configuration, a realworld problem raised by an industrial project. For this problem, which is essentially exact concept identification, we propose a learning algorithm based on a certain type of subset queries formed by strongly closed itemsets and show on realworld datasets that it requires significantly less query evaluations than a naive algorithm based on membership queries. 
PublicationMaximal Closed Set and HalfSpace Separations in Finite Closure Systems( 2020)
;Seiffarth, Florian ;HorvÃ¡th, TÃ¡masMotivated by various binary classification problems in structured data (e.g., graphs or other relational and algebraic structures), we investigate some algorithmic properties of closed set and halfspace separation in abstract closure systems. Assuming that the underlying closure system is finite and given by the corresponding closure operator, we formulate some negative and positive complexity results for these two separation problems. In particular, we prove that deciding halfspace separability in abstract closure systems is NPcomplete in general. On the other hand, for the relaxed problem of maximal closed set separation we propose a simple greedy algorithm and show that it is efficient and has the best possible lower bound on the number of closure operator calls. As a second direction to overcome the negative result above, we consider Kakutani closure systems and show first that our greedy algorithm provides an algorithmic characterization of this kind of set systems. As one of the major potential application fields, we then focus on Kakutani closure systems over graphs and generalize a fundamental characterization result based on the Pasch axiom to graph structure partitioning of finite sets. Though the primary focus of this work is on the generality of the results obtained, we experimentally demonstrate the practical usefulness of our approach on vertex classification in different graph datasets. 
PublicationZertifizierung von KISystemen( 2020)
;Heesen, Jessica ;MÃ¼llerQuade, JÃ¶rnDie Zertifizierung von KÃ¼nstlicher Intelligenz (KI) gilt als eine mÃ¶gliche SchlÃ¼sselvoraussetzung, um den Einsatz von KISystemen in verschiedenen Wirtschafts und Lebensbereichen voranzutreiben. FÃ¼r eine Vielzahl von KISystemen kann eine Zertifizierung dazu beitragen, das gesellschaftliche Nutzenpotential sicher und gemeinwohlorientiert auszuschÃ¶pfen. Eine gelungene Zertifizierung von KISystemen ermÃ¶glicht die ErfÃ¼llung wichtiger gesellschaftlicher und Ã¶konomischer Prinzipien, wie etwa Rechtssicherheit (z. B. Haftung und EntschÃ¤digung), InteroperabilitÃ¤t, ITSicherheit oder Datenschutz. Zudem kann sie bei BÃ¼rgerinnen und BÃ¼rgern Vertrauen schaffen, zu besseren Produkten fÃ¼hren und die nationale und internationale Marktdynamik beeinflussen. Damit sich Zertifizierungsverfahren aber nicht als Innovationshemmnis erweisen, gilt es, bestimmte Standards von KISystemen zu garantieren, Ãœberregulierung zu vermeiden, Innovation zu ermÃ¶glichen und bestenfalls neue Entwicklungen fÃ¼r einen europÃ¤ischen Weg in der KIAnwendung auslÃ¶sen zu kÃ¶nnen. Das Spannungsfeld aus Potentialen und Herausforderungen bei der Zertifizierung von KISystemen haben Expertinnen und Experten der Plattform Lernende Systeme in vorliegendem Impulspapier systematisiert. Das Papier, das unter der Leitung der Arbeitsgruppe ITSicherheit, Privacy, Recht und Ethik sowie der Arbeitsgruppe Technologische Wegbereiter und Data Science entstand, beleuchtet verschiedene technische, juristische und ethische Herausforderungen und gibt zudem einen Ãœberblick Ã¼ber bestehende Initiativen zur Zertifizierung von KISystemen in Deutschland.