Options
Prof. Dr.
Wrobel, Stefan
Now showing
1  9 of 9

PublicationLearning Weakly Convex Sets in Metric Spaces( 20210910)
;StadtlÃ¤nder, EikeWe introduce the notion of weak convexity in metric spaces, a generalization of ordinary convexity commonly used in machine learning. It is shown that weakly convex sets can be characterized by a closure operator and have a unique decomposition into a set of pairwise disjoint connected blocks. We give two generic efficient algorithms, an extensional and an intensional one for learning weakly convex concepts and study their formal properties. Our experimental results concerning vertex classification clearly demonstrate the excellent predictive performance of the extensional algorithm. Two nontrivial applications of the intensional algorithm to polynomial PAClearnability are presented. The first one deals with learning kconvex Boolean functions, which are already known to be efficiently PAClearnable. It is shown how to derive this positive result in a fairly easy way by the generic intensional algorithm. The second one is concerned with the Euclidean space equipped with the Manhattan distance. For this metric space, weakly convex sets form a union of pairwise disjoint axisaligned hyperrectangles. We show that a weakly convex set that is consistent with a set of examples and contains a minimum number of hyperrectangles can be found in polynomial time. In contrast, this problem is known to be NPcomplete if the hyperrectangles may be overlapping. 
PublicationA Novel Regression Loss for NonParametric Uncertainty Optimization( 2021)
;Sicking, Joachim ;Pintz, Maximilian ;Fischer, AsjaQuantification of uncertainty is one of the most promising approaches to establish safe machine learning. Despite its importance, it is far from being generally solved, especially for neural networks. One of the most commonly used approaches so far is Monte Carlo dropout, which is computationally cheap and easy to apply in practice. However, it can underestimate the uncertainty. We propose a new objective, referred to as secondmoment loss (SML), to address this issue. While the full network is encouraged to model the mean, the dropout networks are explicitly used to optimize the model variance. We intensively study the performance of the new objective on various UCI regression datasets. Comparing to the stateoftheart of deep ensembles, SML leads to comparable prediction accuracies and uncertainty estimates while only requiring a single model. Under distribution shift, we observe moderate improvements. As a side result, we introduce an intuitive Wasserstein distancebased uncertainty measure that is nonsaturating and thus allows to resolve quality differences between any two uncertainty estimates. 
PublicationSupport Estimation in Frequent Itemset Mining by Locality Sensitive Hashing( 2019)The main computational effort in generating all frequent itemsets in a transactional database is in the step of deciding whether an itemset is frequent, or not. We present a method for estimating itemset supports with twosided error. In a preprocessing step our algorithm first partitions the database into groups of similar transactions by using locality sensitive hashing and calculates a summary for each of these groups. The support of a query itemset is then estimated by means of these summaries. Our preliminary empirical results indicate that the proposed method results in a speedup of up to a factor of 50 on large datasets. The Fmeasure of the output patterns varies between 0.83 and 0.99.

PublicationMining Tree Patterns with Partially Injective Homomorphisms( 2019)
;Schulz, Till Hendrik ;Welke, PascalOne of the main differences between inductive logic programming (ILP) and graph mining lies in the pattern matching operator applied: While it is mainly defined by relational homomorphism (i.e., subsumption) in ILP, subgraph isomorphism is the most common pattern matching operator in graph mining. Using the fact that subgraph isomorphisms are injective homomorphisms, we bridge the gap between ILP and graph mining by considering a natural transition from homomorphisms to subgraph isomorphisms that is defined by partially injective homomorphisms, i.e., which require injectivity only for subsets of the vertex pairs in the pattern. Utilizing positive complexity results on deciding homomorphisms from bounded treewidth graphs, we present an algorithm mining frequent trees from arbitrary graphs w.r.t. partially injective homomorphisms. Our experimental results show that the predictive performance of the patterns obtained is comparable to that of ordinary frequent subgraphs. Thus, by preserving much from the advantageous properties of homomorphisms and subgraph isomorphisms, our approach provides a tradeoff between efficiency and predictive power. 
PublicationPedestrian quantity estimation with trajectory patterns( 2012)In streetbased mobility mining, traffic volume estimation receives increasing attention as it provides important applications such as emergency support systems, qualityofservice evaluation and billboard placement. In many real world scenarios, empirical measurements are usually sparse due to some constraints. On the other hand, pedestrians generally show some movement preferences, especially in closed environments, e.g., train stations. We propose a Gaussian process regression based method for traffic volume estimation, which incorporates topological information and prior knowledge on preferred trajectories with a trajectory pattern kernel. Our approach also enables effectively finding most informative sensor placements. We evaluate our method with synthetic German train station pedestr ian data and realworld episodic movement data from the zoo of Duisburg. The empirical analysis demonstrates that incorporating trajectory patterns can largely improve the traffic prediction accuracy, especially when traffic networks are sparsely monitored.

PublicationToolkitbased highperformance data mining of large data on MapReduce clusters( 2009)
;Adranale, D.The enormous growth of data in a variety of applications has increased the need for high performance data mining based on distributed environments. However, standard data mining toolkits per se do not allow the usage of computing clusters. The success of MapReduce for analyzing large data has raised a general interest in applying this model to other, data intensive applications. Unfortunately current research has not lead to an integration of GUI based data mining toolkits with distributed file system based MapReduce systems. This paper defines novel principles for modeling and design of the user interface, the storage model and the computational model necessary for the integration of such systems. Additionally, it introduces a novel system architecture for interactive GUI based data mining of large data on clusters based on MapReduce that overcomes the limitations of data mining toolkits. As an empirical demonstration we show an implementation based on Weka and Hadoop. 
PublicationText mining and multimedia search in a large content repository( 2009)
;PaaÃŸ, GerhardMethods of acquiring, seeking and processing knowledge are a strategically vital issue in the context of globalized competition. One of the main subjects currently being researched is the development of semantic technologies that are capable of recognizing and classifying the content and meaning of information (words, pictures or sounds). In the context of the joint project CONTENTUS we show how different text mining techniques in a workflow are able to extract useful semantic information from text. In a comprehensive multimedia search engine these annotations together with text, metadata, and semantic clues extracted from multimedia documents (speech, music, video) may be used to give more focused access to information. 
PublicationContextbased clustering of image search results( 2009)
;Wang, H.In this work we propose to cluster image search results based on the textual contents of the referring webpages. The natural ambiguity and contextdependence of human languages lead to problems that plague modern image search engines: A user formulating a query usually has in mind just one topic, while the results produced to satisfy this query may (and usually do) belong to the different topics. Therefore, only part of the search results are relevant for a user. One of the possible ways to improve the user's experience is to cluster the results according to the topics they belong to and present the clustered results to the user. As opposed to the clustering based on visual features, an approach utilising the text information in the webpages containing the image is less computationally intensive and provides the resulting clusters with semantically meaningful names. 
PublicationEfficient closed pattern mining in strongly accessible set systems( 2007)
;PoignÃ©, Axel