Now showing 1 - 10 of 26
  • Publication
    Archetypal analysis as an autoencoder
    We present an efficient approach to archetypal analysis where we use sub-gradient algorithms for optimization over the simplex to determine archetypes and reconstruction coefficients. Runtime evaluations reveal our approach to be notably more efficient than previous techniques. As an practical application, we consider archetypal analysis for autoencoding.
  • Publication
    Beyond heatmaps: Spatio-temporal clustering using behavior-based partitioning of game levels
    Evaluating the spatial behavior of players allows for comparing design intent with emergent behavior. However, spatial analytics for game development is still in its infancy and current analysis mostly relies on aggregate visualizations such as heatmaps. In this paper, we propose the use of advanced spatial clustering techniques to evaluate player behavior. In particular, we consider the use of DEDICOM and DESICOM, two techniques that operate on asymmetric spatial similarity matrices and can simultaneously uncover preferred locations and likely transitions between them. Our results highlight the ability of asymmetric techniques to partition game maps into meaningful areas and to retain information about player movements between these areas.
  • Publication
    A Comparison of Methods for Player Clustering via Behavioral Telemetry
    The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets can be exceptionally complex, with features recorded for a varying population of users over a temporal segment that can reach years in duration. Categorization of behaviors, whether through descriptive methods (e.g. segmention) or unsupervised/supervised learning techniques, is valuable for finding patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Non-negative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five year interval. The techniques are evaluated with respect to their ability to develop actionable behavioral profiles from the dataset.
  • Publication
    Predicting player churn in the wild
    Free-to-Play or 'freemium' games represent a fundamental shift in the business models of the game industry, facilitated by the increasing use of online distribution platforms and the introduction of increasingly powerful mobile platforms. The ability of a game development company to analyze and derive insights from behavioral telemetry is crucial to the success of these games which rely on in-game purchases and in-game advertising to generate revenue, and for the company to remain competitive in a global marketplace. The ability to model, understand and predict future player behavior has a crucial value, allowing developers to obtain data-driven insights to inform design, development and marketing strategies. One of the key challenges is modeling and predicting player churn. This paper presents the first cross-game study of churn prediction in Free-to-Play games. Churn in games is discussed and thoroughly defined as a formal problem, aligning with industry standards. Furthermore, a range of features which are generic to games are defined and evaluated for their usefulness in predicting player churn, e.g. playtime, session length and session intervals. Using these behavioral features, combined with the individual retention model for each game in the dataset used, we develop a broadly applicable churn prediction model, which does not rely on game-design specific features. The presented classifiers are applied on a dataset covering five free-to-play games resulting in high accuracy churn prediction.
  • Publication
    Behavior evolution in Tomb Raider Underworld
    ( 2013) ;
    Drachen, Anders
    ;
    ; ;
    Canossa, Alessandro
    Behavioral datasets from major commercial game titles of the 'AAA' grade generally feature high dimensionality and large sample sizes, from tens of thousands to millions, covering time scales stretching into several years of real-time, and evolving user populations. This makes dimensionality-reduction methods such as clustering and classification useful for discovering and defining patterns in player behavior. The goal from the perspective of game development is the formation of behavioral profiles that provide actionable insights into how a game is being played, and enables the detection of e.g. problems hindering player progression. Due to its unsupervised nature, clustering is notably useful in cases where no prior-defined classes exist. Previous research in this area has successfully applied clustering algorithms to behavioral datasets from different games. In this paper, the focus is on examining the behavior of 62,000 players from the major commercial game Tomb Ra ider: Underworld, as it unfolds from the beginning of the game and throughout the seven main levels of the game. Where previous research has focused on aggregated behavioral datasets spanning an entire game, or conversely a limited slice or snapshot viewed in isolation, this is to the best knowledge of the authors the first study to examine the application of clustering methods to player behavior as it evolves throughout an entire game.
  • Publication
    Guns, swords and data: Clustering of player behavior in computer games in the wild
    Behavioral data from computer games can be exceptionally high-dimensional, of massive scale and cover a temporal segment reaching years of real-time and a varying population of users. Clustering of user behavior provides a way to discover behavioral patterns that are actionable for game developers. Interpretability and reliability of clustering results is vital, as decisions based on them affect game design and thus ultimately revenue. Here case studies are presented focusing on clustering analysis applied to high-dimensionality player behavior telemetry, covering a combined total of 260,000 characters from two major commercial game titles: the Massively Multiplayer Online Role-Playing Game Tera and the multi-player strategy war game Battlefield 2: Bad Company 2. K-means and Simplex Volume Maximization clustering were applied to the two datasets, combined with considerations of the design of the games, resulting in actionable behavioral profiles. Depending on the algori thm different insights into the underlying behavior of the population of the two games are provided.
  • Publication
    Deterministic CUR for improved large-scale data analysis: An empirical study
    Low-rank approximations which are computed from selected rows and columns of a given data matrix have attracted considerable attention lately. They have been proposed as an alternative to the SVD because they naturally lead to interpretable decompositions which was shown to be successful in application such as fraud detection, fMRI segmentation, and collaborative filtering. The CUR decomposition of large matrices, for example, samples rows and columns according to a probability distribution that depends on the Euclidean norm of rows or columns or on other measures of statistical leverage. At the same time, there are various deterministic approaches that do not resort to sampling and were found to often yield factorization of superior quality with respect to reconstruction accuracy. However , these are hardly applicable to large matrices as they typically suffer from high computational costs. Consequently, many practitioners in the field of data mining have abandon deterministic approaches in favor of randomized ones when dealing with today's large-scale data sets. In this paper, we empirically disprove this prejudice. We do so by introducing a novel, linear-time, deterministic CUR approach that adopts the recently introduced Simplex Volume Maximization approach for column selection. The latter has already been proven to be successful for NMF-like decompositions of matrices of billions of entries. Our exhaustive empirical study on more than $30$ synthetic and real-world data sets demonstrates that it is also beneficial for CUR-like decompositions. Compared to other determinis tic CUR-like methods, it provides comparable reconstruction quality but operates much faster so that it easily scales to matrices of billions of elements. Compared to sampling-based methods, it provides competitive reconstruction quality while staying in the same run-time complexity class.
  • Publication
    Simplex distributions for embedding data matrices over time
    ( 2012) ; ;
    Römer, C.
    ;
    ;
    Ballvora, A.
    ;
    Rascher, U.
    ;
    Leon, J.
    ;
    ;
    Plümer, L.
    Early stress recognition is of great relevance in precision plant protection. Pre-symptomatic water stress detection is of particular interest, ultimately helping to meet the challenge of "How to feed a hungry world?". Due to the climate change, this is of considerable political and public interest. Due to its large-scale and temporal nature, e.g., when monitoring plants using hyperspectral imaging, and the demand of physical meaning of the results, it presents unique computational problems in scale and interpretability. However, big data matrices over time also arise in several other real-life applications such as stock market monitoring where a business sector is characterized by the ups and downs of each of its companies per year or topic monitoring of document collections. Therefore, we consider the general problem of embedding data matrices into Euclidean space over time without making any assumption on the generating distribution of each matrix. To do so, we repre sent all data samples by means of convex combinations of only few extreme ones computable in linear time. On the simplex spanned by the extremes, there are then natural candidates for distributions inducing distances between and in turn embeddings of the data matrices. We evaluate our method across several domains, including synthetic, text, and financial data as well as a large-scale dataset on water stress detection in plants with more than 3 billion matrix entries. The results demonstrate that the embeddings are meaningful and fast to compute. The stress detection results were validated by a domain expert and conform to existing plant physiological knowledge.
  • Publication
    Pre-symptomatic prediction of plant drought stress using Dirichlet-aggregation regression on hyperspectral images
    ( 2012) ; ; ; ; ;
    Roemer, C.
    ;
    Ballvora, A.
    ;
    Rascher, U.
    ;
    Leon, J.
    ;
    Plümer, L.
    Pre-symptomatic drought stress prediction is of great relevance in precision plant protection, ultimately helping to meet the challenge of "How to feed a hungry world?". Unfortunately, it also presents unique computational problems in scale and interpretability: it is a temporal, large-scale prediction task, e.g., when monitoring plants over time using hyperspectral imaging, and features are 'things' with a 'biological' meaning and interpretation and not just mathematical abstractions computable for any data. In this paper we propose Dirichlet-aggregation regression (DAR) to meet the challenge. DAR represents all data by means of convex combinations of only few extreme ones computable in linear time and easy to interpret. Then, it puts a Gaussian process prior on the Dirichlet distributions induced on the simplex spanned by the extremes. The prior can be a function of any observed meta feature such as time, location, type of fertilization, and plant species. We evaluate d DAR on two hyperspectral image series of plants over time with about 2 (resp. 5.8) Billion matrix entries. The results demonstrate that DAR can be learned efficiently and predicts stress well before it becomes visible to the human eye.
  • Publication
    How players lose interest in playing a game: An empirical study based on distributions of total playing times
    ( 2012) ; ; ; ;
    Drachen, Anders
    ;
    Canossa, Alessandro
    Analyzing telemetry data of player behavior in computer games is a topic of increasing interest for industry and research, alike. When applied to game telemetry data, pattern recognition and statistical analysis provide valuable business intelligence tools for game development. An important problem in this area is to characterize how player engagement in a game evolves over time. Reliable models are of pivotal interest since they allow for assessing the long-term success of game products and can provide estimates of how long players may be expected to keep actively playing a game. In this paper, we introduce methods from random process theory into game data mining in order to draw inferences about player engagement. Given large samples (over 250,000 players) of behavioral telemetry data from five different action-adventure and shooter games, we extract information as to how long individual players have played these games and apply techniques from lifetime analysis to identify common patterns. In all five cases, we find that the Weibull distribution gives a good account of the statistics of total playing times. This implies that an average players interest in playing one of the games considered evolves according to a non-homogeneous Poisson process. Therefore, given data on the initial playtime behavior of the players of a game, it becomes possible to predict when they stop playing.