Now showing 1 - 5 of 5
  • Publication
    Anonymization of German financial documents using neural network-based language models with contextual word representations
    The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.
  • Publication
    Towards Intelligent Food Waste Prevention: An Approach Using Scalable and Flexible Harvest Schedule Optimization with Evolutionary Algorithms
    In times of climate change, growing world population, and the resulting scarcity of resources, efficient and economical usage of agricultural land is increasingly important and challenging at the same time. To avoid disadvantages of monocropping for soil and environment, it is advisable to practice intercropping of various plant species whenever possible. However, intercropping is challenging as it requires a balanced planting schedule due to individual cultivation time frames. Maintaining a continuous harvest throughout the season is important as it reduces logistical costs and related greenhouse gas emissions, and can also help to reduce food waste. Motivated by the prevention of food waste, this work proposes a flexible optimization method for a full harvest season of large crop ensembles that complies with given economical and environmental constraints. Our approach applies evolutionary algorithms and we further combine our evolution strategy with a sophisticated hierarchical loss function and adaptive mutation rate. We thus transfer the multi-objective into a pseudo-single-objective optimization problem, for which we obtain faster and better solutions than those of conventional approaches.
  • Publication
    Informed Machine Learning for Industry
    Deep neural networks have pushed the boundaries of artificial intelligence but their training requires vast amounts of data and high performance hardware. While truly digitised companies easily cope with these prerequisites, traditional industries still often lack the kind of data or infrastructures the current generation of end-to-end machine learning depends on. The Fraunhofer Center for Machine Learning therefore develops novel solutions which are informed by expert knowledge. These typically require less training data and are more transparent in their decision-making processes.
  • Publication
    Detecting and correcting spelling errors in high-quality Dutch Wikipedia text
    ( 2018)
    Beeksma, M.
    ;
    Gompel, M. van
    ;
    Kunneman, F.
    ;
    Onrust, L.
    ;
    Regnerus, B.
    ;
    Vinke, D.
    ;
    Brito, Eduardo
    ;
    ;
    For the CLIN28 shared task, we evaluated systems for spelling correction of high-quality text. The task focused on detecting and correcting spelling errors in Dutch Wikipedia pages. Three teams took part in the task. We compared the performance of their systems to that of a baseline system, the Dutch spelling corrector Valkuil. We evaluated the systems' performance in terms of F1 score. Although two of the three participating systems performed well in the task of correcting spelling errors, error detection proved to be a challenging task, and without exception resulted in a high false positive rate. Therefore, the F1 score of the baseline was not improved upon. This paper elaborates on each team's approach to the task, and discusses the overall challenges of correcting high-quality text.
  • Publication
    Can Computers Learn from the Aesthetic Wisdom of the Crowd?
    The social media revolution has led to an abundance of image and video data on the Internet. Since this data is typically annotated, rated, or commented upon by large communities, it provides new opportunities and challenges for computer vision. Social networking and content sharing sites seem to hold the key to the integration of context and semantics into image analysis. In this paper, we explore the use of social media in this regard. We present empirical results obtained on a set of 127,593 images with 3,741,176 tag assignments that were harvested from Flickr, a photo sharing site. We report on how users tag and rate photos and present an approach towards automatically recognizing the aesthetic appeal of images using confidence-based classifiers to alleviate effects due to ambiguously labeled data. Our results indicate that user generated content allows for learning about aesthetic appeal. In particular, established low-level image features seem to enable the recognition of beauty. A reliable recognition of unseemliness, on the other hand, appears to require more elaborate high-level analysis.