Now showing 1 - 3 of 3
  • Publication
    Probabilistic frequent subtree kernels
    ( 2016)
    Welke, Pascal
    ;
    ;
    We propose a new probabilistic graph kernel. It is defined by the set of frequent subtrees generated from a small random sample of spanning trees of the transaction graphs. In contrast to the ordinary frequent subgraph kernel it can be computed efficiently for any arbitrary graphs. Due to its probabilistic nature, the embedding function corresponding to our graph kernel is not always correct. Our empirical results on artificial and real-world chemical datasets, however, demonstrate that the graph kernel we propose is much faster than other frequent pattern based graph kernels, with only marginal loss in predictive accuracy.
  • Publication
    Min-hashing for probabilistic frequent subtree feature spaces
    ( 2016)
    Welke, Pascal
    ;
    ;
    We propose a fast algorithm for approximating graph similarities. For its advantageous semantic and algorithmic properties, we define the similarity between two graphs by the Jaccard-similarity of their images in a binary feature space spanned by the set of frequent subtrees generated for some training dataset. Since the feature space embedding is computationally intractable, we use a probabilistic subtree isomorphism operator based on a small sample of random spanning trees and approximate the Jaccard-similarity by min-hash sketches. The partial order on the feature set defined by subgraph isomorphism allows for a fast calculation of the min-hash sketch, without explicitly performing the feature space embedding. Experimental results on real-world graph datasets show that our technique results in a fast algorithm. Furthermore, the approximated similarities are well-suited for classification and retrieval tasks in large graph datasets.
  • Publication
    On the complexity of frequent subtree mining in very simple structures
    ( 2015)
    Welke, Pascal
    ;
    ;
    We study the complexity of frequent subtree mining in very simple graphs beyond forests. We show for d-tenuous outerplanar graphs that frequent subtrees can be listed with polynomial delay if the cycle degree, i.e., the maximum number of blocks that share a common vertex, is bounded by some constant. The crucial step in the proof of this positive result is a polynomial time algorithm deciding subgraph isomorphism from trees into d-tenuous outerplanar graphs of bounded cycle degree. We obtain this algorithm by generalizing the algorithm of Shamir and Tsur that decides subgraph isomorphism between trees. Our results may also be of some interest to algorithmic graph theory, as they indicate that even for very simple structures, the cycle degree is a crucial parameter for the tractability of subgraph isomorphism. We also discuss some interesting problems towards generalizing the positive result of this work.