Now showing 1 - 10 of 27
  • Publication
    Utilizing Representation Learning for Robust Text Classification Under Datasetshift
    Within One-vs-Rest (OVR) classification, a classifier differentiates a single class of interest (COI) from the rest, i.e. any other class. By extending the scope of the rest class to corruptions (dataset shift), aspects of outlier detection gain relevancy. In this work, we show that adversarially trained autoencoders (ATA) representative of autoencoder-based outlier detection methods, yield tremendous robustness improvements over traditional neural network methods such as multi-layer perceptrons (MLP) and common ensemble methods, while maintaining a competitive classification performance. In contrast, our results also reveal that deep learning methods solely optimized for classification, tend to fail completely when exposed to dataset shift.
  • Publication
    Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions
    In this paper, we describe the combination of machine learning and simulation towards a hybrid modelling approach. Such a combination of data-based and knowledge-based modelling is motivated by applications that are partly based on causal relationships, while other effects result from hidden dependencies that are represented in huge amounts of data. Our aim is to bridge the knowledge gap between the two individual communities from machine learning and simulation to promote the development of hybrid systems. We present a conceptual framework that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation. We also discuss an advanced pairing in the context of Industry 4.0 where we see particular further potential for hybrid systems. In this paper, we describe the combination of machine learning and simulation towards a hybrid modelling approach. Such a combination of data-based and knowledge-based modelling is motivated by applications that are partly based on causal relationships, while other effects result from hidden dependencies that are represented in huge amounts of data. Our aim is to bridge the knowledge gap between the two individual communities from machine learning and simulation to promote the development of hybrid systems. We present a conceptual framework that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation. We also discuss an advanced pairing in the context of Industry 4.0 where we see particular further potential for hybrid systems.
  • Publication
    Triple Classification Using Regions and Fine-Grained Entity Typing
    ( 2019)
    Dong, Tiansi
    ;
    Wang, Zhigang
    ;
    Li, Juanzi
    ;
    ;
    Cremers, Armin B.
    A Triple in knowledge-graph takes a form that consists of head, relation, tail. Triple Classification is used to determine the truth value of an unknown Triple. This is a hard task for 1-to-N relations using the vector-based embedding approach. We propose a new region-based embedding approach using fine-grained type chains. A novel geometric process is presented to extend the vectors of pre-trained entities into n-balls (n-dimensional balls) under the condition that head balls shall contain their tail balls. Our algorithm achieves zero energy loss, therefore, serves as a case study of perfectly imposing tree structures into vector space. An unknown Triple (h, r, x) will be predicted as true, when x's n-ball is located in the r-subspace of h's n-ball, following the same construction of know n tails of h. The experiments are based on large datasets derived from the benchmark datasets WN11, FB13, and WN18. Our results show that the performance of the new method is related to the length of the type chain and the quality of pre-trained entity-embeddings, and that performances of long chains with well-trained entity-embeddings outperform other methods in the literature.
  • Publication
    Two Attempts to Predict Author Gender in Cross-Genre Settings in Dutch
    ( 2019)
    Brito, Eduardo
    ;
    ;
    This paper describes the systems designed by the FraunhoferIAIS team at the CLIN29 shared task on cross-genre gender detection in Dutch. We show two alternative classification approaches: a rather standard one consisting of feature engineering and a random forest classifier; and an alternative one involving a LSTM classifier. Both are enhanced by a LDA model trained on stems. We considered various features such as frequency of function words, parts-of-speech and sentiment among others. We achieved 53.77% average accuracy in the cross-genre settings.
  • Publication
    Towards Shortest Paths via Adiabatic Quantum Computing
    Since first working quantum computers are now available, accelerated developments of this technology may be expected. This will likely impact graph- or network analysis because quantum computers promise fast solutions for many problems in these areas. In this paper, we explore the use of adiabatic quantum computing in finding shortest paths. We devise an Ising energy minimization formulation for this task and discuss how to set up a system of quantum bits to find minimum energy states of the model. In simulation experiments, we numerically solve the corresponding Schrödinger equations and observe our approach to work well. This evidences that shortest path computation can at least be assisted by quantum computers.
  • Publication
    Integrating lateral swaying of pedestrians into simulations
    Traditionally, pedestrian simulations are a standard tool in public space design, crowd management, and evacuation management. In particular, when minimizing evacuation times or identifiying hazardous locations, it is of vital importance that simulations are as accurate and realistic as possible. Although today's pedestrian simulation models give satisfying results in many cases, they are not realistic in highly crowded scenes. In this paper, we describe a characteristic motion pattern that is commonly observed in areas of high pedestrian density and that has not been taken into account in state-of-the-art pedestrian models. Hence, we extend an existing pedestrian model by integrating this characteristic motion pattern and show that our proposed model gives more realistic trajectories.
  • Publication
    Who is doing what? Simultaneous recognition of actions and actors
    ( 2012)
    Cheema, Muhammad Shahzad
    ;
    Eweiwi, Abdalrahman
    ;
    Recognizing human actions in videos has become a rapidly growing area of research. Most existing research has focused only on a single aspect i.e. recognition of actions. However, humans tend to perform different actions in their own styles. In this paper, we deal with the problem of simultaneously identifying actions and the underlying styles (actors) in videos. We propose a hierarchical approach based on conventional action recognition and asymmetric bilinear modeling. Our approach is solely based on dynamics of the underlying activity. Results on the multi-actor multi-action data set IXMAS show a high recognition rate.
  • Publication
    Deterministic CUR for improved large-scale data analysis: An empirical study
    Low-rank approximations which are computed from selected rows and columns of a given data matrix have attracted considerable attention lately. They have been proposed as an alternative to the SVD because they naturally lead to interpretable decompositions which was shown to be successful in application such as fraud detection, fMRI segmentation, and collaborative filtering. The CUR decomposition of large matrices, for example, samples rows and columns according to a probability distribution that depends on the Euclidean norm of rows or columns or on other measures of statistical leverage. At the same time, there are various deterministic approaches that do not resort to sampling and were found to often yield factorization of superior quality with respect to reconstruction accuracy. However , these are hardly applicable to large matrices as they typically suffer from high computational costs. Consequently, many practitioners in the field of data mining have abandon deterministic approaches in favor of randomized ones when dealing with today's large-scale data sets. In this paper, we empirically disprove this prejudice. We do so by introducing a novel, linear-time, deterministic CUR approach that adopts the recently introduced Simplex Volume Maximization approach for column selection. The latter has already been proven to be successful for NMF-like decompositions of matrices of billions of entries. Our exhaustive empirical study on more than $30$ synthetic and real-world data sets demonstrates that it is also beneficial for CUR-like decompositions. Compared to other determinis tic CUR-like methods, it provides comparable reconstruction quality but operates much faster so that it easily scales to matrices of billions of elements. Compared to sampling-based methods, it provides competitive reconstruction quality while staying in the same run-time complexity class.
  • Publication
    Customers who cited this item also cited... - Comparing data from Amazon.com with the new Book Citation Index
    The new Book Citation Index (BCI) was launched by Thomson Reuters in 2011. With it, Thomson aims to complement its range of different publication databases. In November 2011 the BCI included around 27.000 books. Our research-in-progress paper presents first analyses of the data from the BCI like the distribution of subject categories and citations. In addition to these, we also compare the data from the BCI with data from the website Amazon.com. Amazon is the world's largest online retailer with millions of books sold through their catalogue. By matching both data sets, we found that not all of the books that can be found in the BCI are also available at Amazon. Future analyses will include correlations analysis and a comparison of the ranking of different books found in both data sets when sorted by number of citations on the one hand and the "Amazon Best Sellers Rank" on the other hand. Amazon's "Customers Who Bought This Item Also Bought" -recommendation-system is well known. Comparing it with a book's references and citations we wish to analyse whether or not a correlation exists between the two.
  • Publication
    How players lose interest in playing a game: An empirical study based on distributions of total playing times
    ( 2012) ; ; ; ;
    Drachen, Anders
    ;
    Canossa, Alessandro
    Analyzing telemetry data of player behavior in computer games is a topic of increasing interest for industry and research, alike. When applied to game telemetry data, pattern recognition and statistical analysis provide valuable business intelligence tools for game development. An important problem in this area is to characterize how player engagement in a game evolves over time. Reliable models are of pivotal interest since they allow for assessing the long-term success of game products and can provide estimates of how long players may be expected to keep actively playing a game. In this paper, we introduce methods from random process theory into game data mining in order to draw inferences about player engagement. Given large samples (over 250,000 players) of behavioral telemetry data from five different action-adventure and shooter games, we extract information as to how long individual players have played these games and apply techniques from lifetime analysis to identify common patterns. In all five cases, we find that the Weibull distribution gives a good account of the statistics of total playing times. This implies that an average players interest in playing one of the games considered evolves according to a non-homogeneous Poisson process. Therefore, given data on the initial playtime behavior of the players of a game, it becomes possible to predict when they stop playing.