Now showing 1 - 10 of 12
  • Publication
    A Fast Heuristic for Computing Geodesic Closures in Large Networks
    ( 2022-11-06)
    Seiffarth, Florian
    ;
    ;
    Motivated by the increasing interest in applications of graph geodesic convexity in machine learning and data mining, we present a heuristic for approximating the geodesic convex hull of node sets in large networks. It generates a small set of (almost) maximal outerplanar spanning subgraphs for the input graph, computes the geodesic closure in each of these graphs, and regards a node as an element of the convex hull if it belongs to the closed sets for at least a user specified number of outerplanar graphs. Our heuristic algorithm runs in time linear in the number of edges of the input graph, i.e., it is faster with one order of magnitude than the standard algorithm computing the closure exactly. Its performance is evaluated empirically by approximating convexity based core-periphery decomposition of networks. Our experimental results with large real-world networks show that for most networks, the proposed heuristic was able to produce close approximations significantly faster than the standard algorithm computing the exact convex hulls. For example, while our algorithm calculated an approximate core-periphery decomposition in 5 h or less for networks with more than 20 million edges, the standard algorithm did not terminate within 50 days.
  • Publication
    Maximum Margin Separations in Finite Closure Systems
    ( 2021)
    Seiffahrt, Florian
    ;
    ;
    Monotone linkage functions provide a measure for proximities between elements and subsets of a ground set. Combining this notion with Vapniks idea of support vector machines, we extend the concepts of maximal closed set and half-space separation in finite closure systems to those with maximum margin. In particular, we define the notion of margin for finite closure systems by means of monotone linkage functions and give a greedy algorithm computing a maximum margin closed set separation for two sets efficiently. The output closed sets are maximum margin half-spaces, i.e., form a partitioning of the ground set if the closure system is Kakutani. We have empirically evaluated our approach on different synthetic datasets. In addition to binary classification of finite subsets of the Euclidean space, we considered also the problem of vertex classification in graphs. Our experimental results provide clear evidence that maximal closed set separation with maximum margin results in a much better predictive performance than that with arbitrary maximal closed sets.
  • Publication
    Maximal Closed Set and Half-Space Separations in Finite Closure Systems
    ( 2020)
    Seiffarth, Florian
    ;
    ;
    Motivated by various binary classification problems in structured data (e.g., graphs or other relational and algebraic structures), we investigate some algorithmic properties of closed set and half-space separation in abstract closure systems. Assuming that the underlying closure system is finite and given by the corresponding closure operator, we formulate some negative and positive complexity results for these two separation problems. In particular, we prove that deciding half-space separability in abstract closure systems is NP-complete in general. On the other hand, for the relaxed problem of maximal closed set separation we propose a simple greedy algorithm and show that it is efficient and has the best possible lower bound on the number of closure operator calls. As a second direction to overcome the negative result above, we consider Kakutani closure systems and show first that our greedy algorithm provides an algorithmic characterization of this kind of set systems. As one of the major potential application fields, we then focus on Kakutani closure systems over graphs and generalize a fundamental characterization result based on the Pasch axiom to graph structure partitioning of finite sets. Though the primary focus of this work is on the generality of the results obtained, we experimentally demonstrate the practical usefulness of our approach on vertex classification in different graph datasets.
  • Publication
    Probabilistic frequent subtree kernels
    ( 2016)
    Welke, Pascal
    ;
    ;
    We propose a new probabilistic graph kernel. It is defined by the set of frequent subtrees generated from a small random sample of spanning trees of the transaction graphs. In contrast to the ordinary frequent subgraph kernel it can be computed efficiently for any arbitrary graphs. Due to its probabilistic nature, the embedding function corresponding to our graph kernel is not always correct. Our empirical results on artificial and real-world chemical datasets, however, demonstrate that the graph kernel we propose is much faster than other frequent pattern based graph kernels, with only marginal loss in predictive accuracy.
  • Publication
    Min-hashing for probabilistic frequent subtree feature spaces
    ( 2016)
    Welke, Pascal
    ;
    ;
    We propose a fast algorithm for approximating graph similarities. For its advantageous semantic and algorithmic properties, we define the similarity between two graphs by the Jaccard-similarity of their images in a binary feature space spanned by the set of frequent subtrees generated for some training dataset. Since the feature space embedding is computationally intractable, we use a probabilistic subtree isomorphism operator based on a small sample of random spanning trees and approximate the Jaccard-similarity by min-hash sketches. The partial order on the feature set defined by subgraph isomorphism allows for a fast calculation of the min-hash sketch, without explicitly performing the feature space embedding. Experimental results on real-world graph datasets show that our technique results in a fast algorithm. Furthermore, the approximated similarities are well-suited for classification and retrieval tasks in large graph datasets.
  • Publication
    On the complexity of frequent subtree mining in very simple structures
    ( 2015)
    Welke, Pascal
    ;
    ;
    We study the complexity of frequent subtree mining in very simple graphs beyond forests. We show for d-tenuous outerplanar graphs that frequent subtrees can be listed with polynomial delay if the cycle degree, i.e., the maximum number of blocks that share a common vertex, is bounded by some constant. The crucial step in the proof of this positive result is a polynomial time algorithm deciding subgraph isomorphism from trees into d-tenuous outerplanar graphs of bounded cycle degree. We obtain this algorithm by generalizing the algorithm of Shamir and Tsur that decides subgraph isomorphism between trees. Our results may also be of some interest to algorithmic graph theory, as they indicate that even for very simple structures, the cycle degree is a crucial parameter for the tractability of subgraph isomorphism. We also discuss some interesting problems towards generalizing the positive result of this work.
  • Publication
    A logic-based approach to relation extraction from texts
    In recent years, text mining has moved far beyond the classical problem of text classification with an increased interest in more sophisticated processing of large text corpora, such as, for example, evaluations of complex queries. This and several other tasks are based on the essential step of relation extraction. This problem becomes a typical application of learning logic programs by considering the dependency trees of sentences as relational structures and examples of the target relation as ground atoms of a target predicate. In this way, each example is represented by a definite first-order Horn-clause. We show that an adaptation of Plotkin's least general generalization (LGG) operator can effectively be applied to such clauses and propose a simple and effective divide-and-conquer algorithm for listing a certain set of LGGs. We use these LGGs to generate binary features and compute the hypothesis by applying SVM to the feature vectors obtained. Empirical results on the ACE--2003 benchmark dataset indicate that the performance of our approach is comparable to state-of-the-art kernel methods.
  • Publication
    Frequent subgraph mining in outerplanar graphs
    In recent years there has been an increased interest in algorithms that can perform frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we define the class of so called tenuous outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for tenuous outerplanar graphs that works in incremental polynomial time, and evaluate the algorithm empirically on the NCI molecular graph dataset.