Now showing 1 - 10 of 23
  • Publication
    Anonymization of German financial documents using neural network-based language models with contextual word representations
    The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.
  • Publication
    Towards Intelligent Food Waste Prevention: An Approach Using Scalable and Flexible Harvest Schedule Optimization with Evolutionary Algorithms
    In times of climate change, growing world population, and the resulting scarcity of resources, efficient and economical usage of agricultural land is increasingly important and challenging at the same time. To avoid disadvantages of monocropping for soil and environment, it is advisable to practice intercropping of various plant species whenever possible. However, intercropping is challenging as it requires a balanced planting schedule due to individual cultivation time frames. Maintaining a continuous harvest throughout the season is important as it reduces logistical costs and related greenhouse gas emissions, and can also help to reduce food waste. Motivated by the prevention of food waste, this work proposes a flexible optimization method for a full harvest season of large crop ensembles that complies with given economical and environmental constraints. Our approach applies evolutionary algorithms and we further combine our evolution strategy with a sophisticated hierarchical loss function and adaptive mutation rate. We thus transfer the multi-objective into a pseudo-single-objective optimization problem, for which we obtain faster and better solutions than those of conventional approaches.
  • Publication
    Performance of ECG-based seizure detection algorithms strongly depends on training and test conditions
    ( 2021)
    Jahanbekam, A.
    ;
    Baumann, J.
    ;
    Nass, R.D.
    ;
    ;
    Hill, H.
    ;
    Elger, C.E.
    ;
    Surges, R.
    Objective. To identify non-EEG-based signals and algorithms for detection of motor and non-motor seizures in people lying in bed during video-EEG (VEEG) monitoring and to test whether these algorithms work in freely moving people during mobile EEG recordings. Methods. Data of three groups of adult people with epilepsy (PwE) were analyzed. Group 1 underwent VEEG with additional devices (accelerometry, ECG, electrodermal activity); group 2 underwent VEEG; and group 3 underwent mobile EEG recordings both including one-lead ECG. All seizure types were analyzed. Feature extraction and machine-learning techniques were applied to develop seizure detection algorithms. Performance was expressed as sensitivity, precision, F1 score, and false positives per 24 hours. Results. The algorithms were developed in group 1 (35 PwE, 33 seizures) and achieved best results (F1 score 56%, sensitivity 67%, precision 45%, false positives 0.7/24 hours) when ECG features alone were used, with no improvement by including accelerometry and electrodermal activity. In group 2 (97 PwE, 255 seizures), this ECG-based algorithm largely achieved the same performance (F1 score 51%, sensitivity 39%, precision 73%, false positives 0.4/24 hours). In group 3 (30 PwE, 51 seizures), the same ECG-based algorithm failed to meet up with the performance in groups 1 and 2 (F1 score 27%, sensitivity 31%, precision 23%, false positives 1.2/24 hours). ECG-based algorithms were also separately trained on data of groups 2 and 3 and tested on the data of the other groups, yielding maximal F1 scores between 8% and 26%. Significance. Our results suggest that algorithms based on ECG features alone can provide clinically meaningful performance for automatic detection of all seizure types. Our study also underscores that the circumstances under which such algorithms were developed, and the selection of the training and test data sets need to be considered and limit the application of such systems to unseen patient groups behaving in different conditions.
  • Publication
    Matrix- and Tensor Factorization for Game Content Recommendation
    Commercial success of modern freemium games hinges on player satisfaction and retention. This calls for the customization of game content or game mechanics in order to keep players engaged. However, whereas game content is already frequently generated using procedural content generation, methods that can reliably assess what kind of content suits a players skills or preferences are still few and far between. Addressing this challenge, we propose novel recommender systems based on latent factor models that allow for recommending quests in a single player role-playing game. In particular, we introduce a tensor factorization algorithm to decompose collections of bipartite matrices which represent how players interests and behaviors change over time. Extensive online bucket type tests during the ongoing operation of a commercial game reveal that our system is able to recommend more engaging quests and to retain more players than previous handcrafted or collaborative filtering approaches.
  • Publication
    Informed Machine Learning for Industry
    Deep neural networks have pushed the boundaries of artificial intelligence but their training requires vast amounts of data and high performance hardware. While truly digitised companies easily cope with these prerequisites, traditional industries still often lack the kind of data or infrastructures the current generation of end-to-end machine learning depends on. The Fraunhofer Center for Machine Learning therefore develops novel solutions which are informed by expert knowledge. These typically require less training data and are more transparent in their decision-making processes.
  • Publication
    Detecting and correcting spelling errors in high-quality Dutch Wikipedia text
    ( 2018)
    Beeksma, M.
    ;
    Gompel, M. van
    ;
    Kunneman, F.
    ;
    Onrust, L.
    ;
    Regnerus, B.
    ;
    Vinke, D.
    ;
    Brito, Eduardo
    ;
    ;
    For the CLIN28 shared task, we evaluated systems for spelling correction of high-quality text. The task focused on detecting and correcting spelling errors in Dutch Wikipedia pages. Three teams took part in the task. We compared the performance of their systems to that of a baseline system, the Dutch spelling corrector Valkuil. We evaluated the systems' performance in terms of F1 score. Although two of the three participating systems performed well in the task of correcting spelling errors, error detection proved to be a challenging task, and without exception resulted in a high false positive rate. Therefore, the F1 score of the baseline was not improved upon. This paper elaborates on each team's approach to the task, and discusses the overall challenges of correcting high-quality text.
  • Publication
    Prediction of memory formation based on absolute electroencephalographic phases in rhinal cortex and hippocampus outperforms prediction based on stimulus-related phase shifts
    ( 2018)
    Derner, M.
    ;
    Jahanbekam, A.
    ;
    ;
    Axmacher, N.
    ;
    Fell, J.
    Absolute (i.e. measured) rhinal and hippocampal phase values are predictive for memory formation. It has been an open question, whether the capability of mediotemporal structures to react to stimulus presentation with phase shifts may be similarly indicative of successful memory formation. We analysed data from 27 epilepsy patients implanted with depth electrodes in the hippocampus and entorhinal cortex, who performed a continuous word recognition task. Electroencephalographic phase information related to the first presentation of repeatedly presented words was used for prediction of subsequent remembering vs. forgetting applying a support vector machine. The capability to predict successful memory formation based on stimulus-related phase shifts was compared to that based on absolute phase values. Average hippocampal phase shifts were larger and rhinal phase shifts were more accumulated for later remembered compared to forgotten trials. Nevertheless, prediction based on absolute phase values clearly outperformed phase shifts and there was no significant increase in prediction accuracies when combining both measures. Our findings indicate that absolute rhinal and hippocampal phases and not stimulus-related phase shifts are most relevant for successful memory formation. Absolute phases possibly affect memory formation via influencing neural membrane potentials and thereby controlling the timing of neural firing.
  • Publication
    Simplex Volume Maximization (SiVM): A matrix factorization algorithm with non-negative constrains and low computing demands for the interpretation of full spectral X-ray fluorescence imaging data
    ( 2017)
    Alfeld, M.
    ;
    ; ; ;
    Snickt, G. van der
    ;
    Noble, P.
    ;
    Janssens, K.
    ;
    Wellenreuther, G.
    ;
    Falkenberg, G.
    Technological progress allows for an ever-faster acquisition of hyperspectral data, challenging the users to keep up with interpreting the recorded data. Matrix factorization, the representation of data sets by bases (or loads) and coefficient (or score) images is long used to support the interpretation of complex data sets. We propose in this publication Simplex Volume Maximization (SiVM) for the analysis of X-ray fluorescence (XRF) imaging data sets. SiVM selects archetypical data points that represents the data set and thus provides easily understandable bases, preserves the non-negative character of XRF data sets and has low demands concerning computing resources. We apply SiVM on an XRF data set of Hans Memling's Portrait of a man from the Lespinette family from the collection of the Mauritshuis (The Hague, NL) and discuss capabilities and shortcomings of SiVM.
  • Publication
    High dimensional low sample size activity recognition using geometric classifiers
    ( 2015)
    Cheema, Muhammad Shahzad
    ;
    Eweiwi, Abdalrahman
    ;
    Research on high dimension, low sample size (HDLSS) data has revealed their neighborless nature. This paper addresses the classification of HDLSS image or video data for human activity recognition. Existing approaches often use off-the-shelf classifiers such as nearest neighbor techniques or support vector machines and tend to ignore the geometry of underlying feature distributions. Addressing this issue, we investigate different geometric classifiers and affirm the lack of neighborhoods within HDLSS data. As this undermines proximity based methods and may cause over-fitting for discriminant methods, we propose a QR factorization approach to Nearest Affine Hull (NAH) classification which remedies the HDLSS dilemma and noticeably reduces time and memory requirements of existing methods. We show that the resulting non-parametric models provide smooth decision surfaces and yield efficient and accurate solutions in multiclass HDLSS scenarios. On several action recognition benchmarks, the proposed NAH classifier outperforms other instance based methods and shows competitive or superior performance than SVMs. In addition, for online settings, the proposed NAH method is faster than online SVMs.
  • Publication
    Exploring human vision driven features for pedestrian detection
    ( 2015)
    Zhang, S.S.
    ;
    ;
    Klein, Dominik A.
    ;
    Cremers, Armin B.
    Motivated by the center-surround mechanism in the human visual attention system, we propose to use average contrast maps for the challenge of pedestrian detection in street scenes due to the observation that pedestrians indeed exhibit discriminative contrast texture. Our main contributions are the first to design a local statistical multichannel descriptor to incorporate both color and gradient information. Second, we introduce a multidirection and multiscale contrast scheme based on grid cells to integrate expressive local variations. Contributing to the issue of selecting most discriminative features for assessing and classification, we perform extensive comparisons with respect to statistical descriptors, contrast measurements, and scale structures. By this way, we obtain reasonable results under various configurations. Empirical findings from applying our optimized detector on the INRIA and Caltech pedestrian datasets show that our features yield state-of-the-art performance in pedestrian detection.