Now showing 1 - 10 of 12
  • Publication
    Constructing Spaces and Times for Tactical Analysis in Football
    ( 2021)
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Anzer, Gabriel
    ;
    Bauer, Pascal
    ;
    Budziak, Guido
    ;
    ; ;
    Weber, Hendrik
    ;
    A possible objective in analyzing trajectories of multiple simultaneously moving objects, such as football players during a game, is to extract and understand the general patterns of coordinated movement in different classes of situations as they develop. For achieving this objective, we propose an approach that includes a combination of query techniques for flexible selection of episodes of situation development, a method for dynamic aggregation of data from selected groups of episodes, and a data structure for representing the aggregates that enables their exploration and use in further analysis. The aggregation, which is meant to abstract general movement patterns, involves construction of new time-homomorphic reference systems owing to iterative application of aggregation operators to a sequence of data selections. As similar patterns may occur at different spatial locations, we also propose constructing new spatial reference systems for aligning and matching movements irrespective of their absolute locations. The approach was tested in application to tracking data from two Bundesliga games of the 2018/2019 season. It enabled detection of interesting and meaningful general patterns of team behaviors in three classes of situations defined by football experts. The experts found the approach and the underlying concepts worth implementing in tools for football analysts.
  • Publication
    Effective approximation of parametrized closure systems over transactional data streams
    Strongly closed itemsets, defined by a parameterized closure operator, are a generalization of ordinary closed itemsets. Depending on the strength of closedness, the family of strongly closed itemsets typically forms a tiny subfamily of ordinary closed itemsets that is stable against changes in the input. In this paper we consider the problem of mining strongly closed itemsets from transactional data streams. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. We motivate the problem setting considered by two practical applications. In particular, we first experimentally demonstrate that the above properties, i.e., compactness and stability, make strongly closed itemsets an excellent indicator of certain types of concept drifts in transactional data streams. As a second application we consider computer-aided product configuration, a real-world problem raised by an industrial project. For this problem, which is essentially exact concept identification, we propose a learning algorithm based on a certain type of subset queries formed by strongly closed itemsets and show on real-world datasets that it requires significantly less query evaluations than a naive algorithm based on membership queries.
  • Publication
    A review of machine learning for the optimization of production processes
    Due to the advances in the digitalization process of the manufacturing industry and the resulting available data, there is tremendous progress and large interest in integrating machine learning and optimization methods on the shop floor in order to improve production processes. Additionally, a shortage of resources leads to increasing acceptance of new approaches, such as machine learning to save energy, time, and resources, and avoid waste. After describing possible occurring data types in the manufacturing world, this study covers the majority of relevant literature from 2008 to 2018 dealing with machine learning and optimization approaches for product quality or process improvement in the manufacturing industry. The review shows that there is hardly any correlation between the used data, the amount of data, the machine learning algorithms, the used optimizers, and the respective problem from the production. The detailed correlations between these criteria and the recent progress made in this area as well as the issues that are still unsolved are discussed in this paper.
  • Publication
    Big Data, Big Opportunities
    Angetrieben von den technischen Innovationen in der Informatik stehen in allen Bereichen von Wirtschaft, Gesellschaft und Privatleben heute immer mehr Daten zur Verfügung, die potenziell übertragen, gespeichert und analysiert werden könnten, um daraus nützliche Informationen als Grundlage für neue Dienste zu gewinnen. Technische Neuerungen wie die verteilte oder speicherresidente Verarbeitung von Daten haben dazu geführt, dass unsere Analysefähigkeiten so stark gewachsen sind, dass eine neue Klasse von Anwendungen möglich erscheint. Unter dem Schlagwort ,,Big Data"" scheint sich daher zurzeit eine Revolution bei der Nutzung von Daten in allen Bereichen anzukündigen. Der vorliegende Artikel versucht angesichts aktueller Studien zur Nutzung von Big Data-Ansätzen zu beleuchten, inwieweit die großen öffentlichen Erwartungen sich tatsächlich schon im praktischen Ansatz insbesondere in Unternehmen niederschlagen. Er identifiziert darüber hinaus auf Basis allgemeiner und in den Studien zu beobachtender Trends die wichtigsten Herausforderungen, denen sich das Thema Big Data in den nächsten Jahren stellen muss, wenn es die hohen aktuellen Erwartungen auch längerfristig einlösen will.
  • Publication
    Introduction to the special issue on mining and learning with graphs
    ( 2011)
    Vishwanathan, S.V.N.
    ;
    Kaski, S.
    ;
    Neville, J.
    ;
  • Publication
    Movement data anonymity through generalization
    ( 2010)
    Monreale, Anna
    ;
    Andrienko, Gennady
    ;
    Andrienko, Natalia
    ;
    Giannotti, Fosca
    ;
    Pedreschi, Dino
    ;
    Rinzivillo, Salvatore
    ;
    Wireless networks and mobile devices, such as mobile phones and GPS receivers, sense and track the movements of people and vehicles, producing society-wide mobility databases. This is a challenging scenario for data analysis and mining. On the one hand, exciting opportunities arise out of discovering new knowledge about human mobile behavior, and thus fuel intelligent info-mobility applications. On other hand, new privacy concerns arise when mobility data are published. The risk is particularly high for GPS trajectories, which represent movement of a very high precision and spatio-temporal resolution: the de-identification of such trajectories (i.e., forgetting the ID of their associated owners) is only a weak protection, as generally it is possible to re-identify a person by observing her routine movements. In this paper we propose a method for achieving true anonymity in a dataset of published trajectories, by defining a transformation of the original GPS trajectories based on spatial generalization and k-anonymity. The proposed method offers a formal data protection safeguard, quantified as a theoretical upper bound to the probability of re-identification. We conduct a thorough study on a real-life GPS trajectory dataset, and provide strong empirical evidence that the proposed anonymity techniques achieve the conflicting goals of data utility and data privacy. In practice, the achieved anonymity protection is much stronger than the theoretical worst case, while the quality of the cluster analysis on the trajectory data is preserved.
  • Publication
    Frequent subgraph mining in outerplanar graphs
    In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of well-behaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo- and bioinformatics applications.
  • Publication
    Efficient discovery of interesting patterns based on strong closedness
    Finding patterns that are interesting to a user in a certain application context is one of the central goals of data mining research. Regarding all patterns above a certain frequency threshold as interesting is one way of defining interestingness. In this paper, however, we argue that in many applications, a different notion of interestingness is required in order to be able to capture "long", and thus particularly informative, patterns that are correspondingly of low frequency. To identify such patterns, our proposed measure of interestingness is based on the degree or strength of closedness of the patterns. We show that (i) indeed this definition selects long interesting patterns that are difficult to identify with frequency-based approaches, and (ii) that it selects patterns that are robust against noise and/or dynamic changes. We prove that the family of interesting patterns proposed here forms a closure system and use the corresponding closure operator to design a mining algorithm listing these patterns in amortized quadratic time. In particular, for nonsparse datasets its time complexity is O(nm) per pattern, where n denotes the number of items and m the size of the database. This is equal to the best known time bound for listing ordinary closed frequent sets, which is a special case of our problem. We also report empirical results with real-world datasets.
  • Publication
    Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds
    ( 2008)
    Geppert, H.
    ;
    ; ; ;
    Bajorath, J.
    Similarity searching using molecular fingerprints is computationally efficient and a surprisingly effective virtual screening tool. In this study, we have compared ranking methods for similarity searching using multiple active reference molecules. Different 2D fingerprints were used as search tools and also as descriptors for a support vector machine (SVM) algorithm. In systematic database search calculations, a SVM-based ranking scheme consistently outperformed nearest neighbor and centroid approaches, regardless of the fingerprints that were tested, even if only very small training sets were used for SVM learning. The superiority of SVM-based ranking over conventional fingerprint methods is ascribed to the fact that SVM makes use of information about database molecules, in addition to known active compounds, during the learning phase.