Now showing 1 - 10 of 37
PublicationA Fast Heuristic for Computing Geodesic Closures in Large Networks( 2022-11-06)
;Seiffarth, Florian ;Motivated by the increasing interest in applications of graph geodesic convexity in machine learning and data mining, we present a heuristic for approximating the geodesic convex hull of node sets in large networks. It generates a small set of (almost) maximal outerplanar spanning subgraphs for the input graph, computes the geodesic closure in each of these graphs, and regards a node as an element of the convex hull if it belongs to the closed sets for at least a user specified number of outerplanar graphs. Our heuristic algorithm runs in time linear in the number of edges of the input graph, i.e., it is faster with one order of magnitude than the standard algorithm computing the closure exactly. Its performance is evaluated empirically by approximating convexity based core-periphery decomposition of networks. Our experimental results with large real-world networks show that for most networks, the proposed heuristic was able to produce close approximations significantly faster than the standard algorithm computing the exact convex hulls. For example, while our algorithm calculated an approximate core-periphery decomposition in 5 h or less for networks with more than 20 million edges, the standard algorithm did not terminate within 50 days.
PublicationDecision Snippet Features( 2021-05-05)
;Welke, Pascal ;Alkhoury, Fouad ;Decision trees excel at interpretability of their prediction results. To achieve required prediction accuracies, however, often large ensembles of decision trees random forests are considered, reducing interpretability due to large size. Additionally, their size slows down inference on modern hardware and restricts their applicability in low-memory embedded devices. We introduce Decision Snippet Features, which are obtained from small subtrees that appear frequently in trained random forests. We subsequently show that linear models on top of these features achieve comparable and sometimes even better predictive performance than the original random forest, while reducing the model size by up to two orders of magnitude.
PublicationMaximum Margin Separations in Finite Closure Systems( 2021)
;Seiffahrt, Florian ;Monotone linkage functions provide a measure for proximities between elements and subsets of a ground set. Combining this notion with Vapniks idea of support vector machines, we extend the concepts of maximal closed set and half-space separation in finite closure systems to those with maximum margin. In particular, we define the notion of margin for finite closure systems by means of monotone linkage functions and give a greedy algorithm computing a maximum margin closed set separation for two sets efficiently. The output closed sets are maximum margin half-spaces, i.e., form a partitioning of the ground set if the closure system is Kakutani. We have empirically evaluated our approach on different synthetic datasets. In addition to binary classification of finite subsets of the Euclidean space, we considered also the problem of vertex classification in graphs. Our experimental results provide clear evidence that maximal closed set separation with maximum margin results in a much better predictive performance than that with arbitrary maximal closed sets.
PublicationHOPS: Probabilistic Subtree Mining for Small and Large Graphs( 2020)
;Welke, Pascal ;Seiffahrt, Florian ;Frequent subgraph mining, i.e., the identification of relevant patterns in graph databases, is a well-known data mining problem with high practical relevance, since next to summarizing the data, the resulting patterns can also be used to define powerful domain-specific similarity functions for prediction. In recent years, significant progress has been made towards subgraph mining algorithms that scale to complex graphs by focusing on tree patterns and probabilistically allowing a small amount of incompleteness in the result. Nonetheless, the complexity of the pattern matching component used for deciding subtree isomorphism on arbitrary graphs has significantly limited the scalability of existing approaches. In this paper, we adapt sampling techniques from mathematical combinatorics to the problem of probabilistic subtree mining in arbitrary databases of many small to medium-size graphs or a single large graph. By restricting on tree patterns, we provide an algorithm tha t approximately counts or decides subtree isomorphism for arbitrary transaction graphs in sub-linear time with one-sided error. Our empirical evaluation on a range of benchmark graph datasets shows that the novel algorithm substantially outperforms state-of-the-art approaches both in the task of approximate counting of embeddings in single large graphs and in probabilistic frequent subtree mining in large databases of small to medium sized graphs.
PublicationMaximal Closed Set and Half-Space Separations in Finite Closure Systems( 2020)
;Seiffarth, Florian ;Motivated by various binary classification problems in structured data (e.g., graphs or other relational and algebraic structures), we investigate some algorithmic properties of closed set and half-space separation in abstract closure systems. Assuming that the underlying closure system is finite and given by the corresponding closure operator, we formulate some negative and positive complexity results for these two separation problems. In particular, we prove that deciding half-space separability in abstract closure systems is NP-complete in general. On the other hand, for the relaxed problem of maximal closed set separation we propose a simple greedy algorithm and show that it is efficient and has the best possible lower bound on the number of closure operator calls. As a second direction to overcome the negative result above, we consider Kakutani closure systems and show first that our greedy algorithm provides an algorithmic characterization of this kind of set systems. As one of the major potential application fields, we then focus on Kakutani closure systems over graphs and generalize a fundamental characterization result based on the Pasch axiom to graph structure partitioning of finite sets. Though the primary focus of this work is on the generality of the results obtained, we experimentally demonstrate the practical usefulness of our approach on vertex classification in different graph datasets.
PublicationAdiabatic Quantum Computing for Max-Sum Diversification( 2020)
; ;The combinatorial problem of max-sum diversification asks for a maximally diverse subset of a given set of data. Here, we show that it can be expressed as an Ising energy minimization problem. Given this result, max-sum diversification can be solved on adiabatic quantum computers and we present proof of concept simulations which support this claim. This, in turn, suggests that quantum computing might play a role in data mining. We therefore discuss quantum computing in a tutorial like manner and elaborate on its current strengths and weaknesses for data analysis.
PublicationMax-Sum Dispersion via Quantum Annealing( 2019)
; ; ;We devise an Ising model for the max-sum dispersion problem which occurs in contexts such as Web search or text summarization. Given this Ising model, max-sum dispersion can be solved on adiabatic quantum computers; in proof of concept simulations, we solve the corresponding Schrödinger equations and observe our approach to work well.
PublicationArtificial intelligence meets is researchers: Can it replace us?( 2019)
;Loebbecke, C. ;Sawy, O. el ;Kankanhalli, A. ;Lynne Markus, M. ;Te'Eni, D.In the era of accelerating digitization and rapid advances in Artificial Intelligence (AI), increasingly more job tasks may be automated by AI. However, there is little critical analysis of how this will happen, if at all, and to what kind of professions to greater or lesser extents. A few studies suggest that highly creative and knowledge-intensive tasks cannot be substituted by AI. Yet, there have been examples of creative art pieces generated by AI algorithms that even art critics could not distinguish from human-drawn paintings. As IS (and most other) researchers, we pride ourselves on the scarcity, novelty, and creativity of our work. Thus, this panel will debate the critical question for IS academics -whether AI can and will replace our major activity, IS research, - or even us IS researchers.
PublicationA QUBO Formulation of the k-Medoids Problem( 2019)
; ; ; ;We are concerned with k-medoids clustering and propose aquadratic unconstrained binary optimization (QUBO) formulation of the problem of identifying k medoids among n data points without having to cluster the data. Given our QUBO formulation of this NP-hard problem, it should be possible to solve it on adiabatic quantum computers.
PublicationLeveraging Domain Knowledge for Reinforcement Learning using MMC ArchitecturesDespite the success of reinforcement learning methods in various simulated robotic applications, end-to-end training suffers from extensive training times due to high sample complexity and does not scale well to realistic systems. In this work, we speed up reinforcement learning by incorporating domain knowledge into policy learning. We revisit an architecture based on the mean of multiple computations (MMC) principle known from computational biology and adapt it to solve a reacher task. We approximate the policy using a simple MMC network, experimentally compare this idea to end-to-end deep learning architectures, and show that our approach reduces the number of interactions required to approximate a suitable policy by a factor of ten.