Now showing 1 - 10 of 14
  • Publication
    A Community Detection Based Approach for Exploring Patterns in Player Reviews
    Optimizing player retention and engagement by providing tailored game content to their audience remain as a challenging task for game developers. Tracking and analyzing player engagement data such as in-game behavioral data as well as out-game, such as online text reviews or social media postings, are crucial in identifying user concerns and capturing user preferences. In particular, studying and understanding user reviews has therefore become an integral component of any game development process and is pursued as a research area actively. In this paper, we are interested in extracting latent and influential topics by analyzing text reviews on a popular game community website. Towards addressing this, we present an exploratory analysis with the application of a hierarchical community detection-based hybrid algorithm that extract topics from a given corpus of game reviews. Our analysis reveals interesting topics and sub-topics which can be used for further downstream analysis.
  • Publication
    Patterns and Outliers in Temporal Point Processes
    The behavior of users of Web sites or services is usually analyzed on the population level because data as to individual activities is typically sparse and time discrete. And while Poisson processes provide models for such data, they often lack the flexibility to cope with wide ranges of different behaviors. In this paper, we propose a clustering algorithm for temporal point process intensities that has this flexibility as we incorporate a fast spline interpolation method. We use a scalable kernel approach to handle local translation invariance and employ maximum entropy principles for model selection and outlier detection. This way, our framework allows us to uncover fine grained dynamical classes of behavior and we apply it to user answer time-stamps crawled from the question answering site StackOverflow, push events in the Github version control repository, and to transactions in the BitCoin network. Results indicate that on StackOverflow sites the users classes are divided into bursty dynamics and into weekly dynamics with low replica activity. For the BitCoin data, we find patterns of growing activity reflecting economic bubble behavior. The Github data reveals patterns of concentrated and periodic activity.
  • Publication
    Adiabatic quantum computing for kernel k = 2 means clustering
    Adiabatic quantum computers are tailored towards finding minimum energy states of Ising models. The quest for implementations of machine learning algorithms on such devices thus is the quest for Ising model (re-)formulations of their underlying objective functions. In this paper, we discuss how to accomplish this for the problem of kernel binary clustering. We then discuss how our models can be solved on an adiabatic quantum computing device. Finally, in simulation experiments, we numerically solve the respective Schrödinger equations and observe our approaches to yield convincing results.
  • Publication
    Towards Shortest Paths via Adiabatic Quantum Computing
    Since first working quantum computers are now available, accelerated developments of this technology may be expected. This will likely impact graph- or network analysis because quantum computers promise fast solutions for many problems in these areas. In this paper, we explore the use of adiabatic quantum computing in finding shortest paths. We devise an Ising energy minimization formulation for this task and discuss how to set up a system of quantum bits to find minimum energy states of the model. In simulation experiments, we numerically solve the corresponding Schrödinger equations and observe our approach to work well. This evidences that shortest path computation can at least be assisted by quantum computers.
  • Publication
    Ising models for binary clustering via adiabatic quantum computing
    Existing adiabatic quantum computers are tailored towards minimizing the energies of Ising models. The quest for implementations of pattern recognition or machine learning algorithms on such devices can thus be seen as the quest for Ising model (re-)formulations of their objective functions. In this paper, we present Ising models for the tasks of binary clustering of numerical and relational data and discuss how to set up corresponding quantum registers and Hamiltonian operators. In simulation experiments, we numerically solve the respective Schrödinger equations and observe our approaches to yield convincing results.
  • Publication
    Informed machine learning through functional composition
    Addressing general problems with applied machine learning, we sketch an approach towards informed learning. The general idea is to treat data driven learning not as a parameter estimation problem but as a problem of sequencing predefined operations. We show by means of an example that this allows for incorporating expert knowledge and leads to traceable or explainable decision making systems.
  • Publication
    Inverse dynamical inheritance in stack exchange taxonomies
    Question Answering websites are popular repositories of expert knowledge and cover areas as diverse as linguistics, computer science, or mathematics. Knowledge is commonly organized via user defined tags which implicitly create population folksonomies. However, the interplay between latent knowledge structures and the answering behavior of users has not been fully explored yet. Here, we propose a model of a dynamical tagging process guided by taxonomies, devise a robust algorithm that allow us to uncover hidden topic hierarchies, apply our method to analyze several Stack Exchange websites. Our results show that the dynamics of the system strongly correlate with uncovered taxonomies.
  • Publication
    Third party effect: Community based spreading in complex networks
    A substantial amount of network research has been devoted to the study of spreading processes and community detection without considering the role of communities in the characteristics of spreading processes. Here, we generalize the SIR model of epidemics by introducing a matrix of community infecting rates to capture the heterogeneous nature of the spreading as defined by the natural characteristics of communities. We find that the spreading capabilities of one community towards another is influenced by the internal behavior of third party communities. Our results provide insights into systems with rich information structure and into populations with diverse immunology responses.
  • Publication
    Investigating and forecasting user activities in newsblogs: A study of seasonality, volatility and attention burst
    The study of collective attention is a major topic in the area of Web science as we are interested to know how a particular news topic or meme is gaining or losing popularity over time. Recent research focused on developing methods which quantify the success and popularity of topics and studyied their dynamics over time. Yet, the aggregate behavior of users across content creation platforms has been largely ignored even though the popularity of news items is also linked to the way users interact with the Web platforms. In this paper, we present a novel framework of research which studies the shift of attentions of population over newsblogs. We concentrate on the commenting behavior of users for news articles which serves as a proxy for attention to Web content. We make use of methods from signal processing and econometrics to uncover patterns in the behavior of users which then allow us to simulate and hence to forecast the behavior of a population once an attention shift occurs. Studying a dataset of over 200 blogs with 14 million news posts, we found periodic regularities in the commenting behavior. Namely, cycles of 7 days as well as 24 days of activity which may be related to known scales of meme lifetimes.
  • Publication
    Interpretable matrix factorization with stochasticity constrained nonnegative DEDICOM
    Decomposition into Directed Components (DEDICOM) is a special matrix factorization technique to factorize a given asymmetric similarity matrix into a combination of a loading matrix describing the latent structures in the data and an asymmetric affinity matrix encoding the relationships between the found latent structures. Finding DEDI- COM factors can be cast as a matrix norm minimization problem that requires alternating least square updates to find appropriate factors. Yet, due to the way DEDICOM reconstructs the data, unconstrained factors might yield results that are difficult to interpret. In this paper we de- rive a projection-free gradient descent based alternating least squares algorithm to calculate constrained DEDICOM factors. Our algorithm constrains the loading matrix to be column-stochastic and the affinity matrix to be nonnegative for more interpretable low rank representations. Additionally, unlike most of the available approximate solutions for finding the loading matrix, our approach takes the entire occurrences of the loading matrix into account to assure convergence. We evaluate our algorithm on a behavioral dataset containing pairwise asymmetric associations between variety of game titles from an online platform.