Now showing 1 - 10 of 223
  • Publication
    Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models
    ( 2023-12) ;
    Leonhard, David
    ;
    ;
    Berger, Armin
    ;
    Khaled, Mohamed
    ;
    Heiden, Sarah
    ;
    Dilmaghani, Tim
    ;
    Kliem, Bernd
    ;
    Loitz, Rüdiger
    ;
    ;
    Correct identification and correction of contradictions and inconsistencies within financial reports constitute a fundamental component of the audit process. To streamline and automate this critical task, we introduce a novel approach leveraging large language models and an embedding-based paragraph clustering methodology. This paper assesses our approach across three distinct datasets, including two annotated datasets and one unannotated dataset, all within a zero-shot framework. Our findings reveal highly promising results that significantly enhance the effectiveness and efficiency of the auditing process, ultimately reducing the time required for a thorough and reliable financial report audit.
  • Publication
    Automatic scoring of Rhizoctonia crown and root rot affected sugar beet fields from orthorectified UAV images using Machine Learning
    ( 2023-09-27)
    Ispizua Yamati, Facundo Ramón
    ;
    ;
    Barreto Alcántara, Abel Andree
    ;
    Bömer, Jonas
    ;
    Laufer, Daniel
    ;
    ;
    Mahlein, Anne-Katrin
    Rhizoctonia crown and root rot (RCRR), caused by Rhizoctonia solani, can cause severe yield and quality losses in sugar beet. The most common strategy to control the disease is the development of resistant varieties. In the breeding process, field experiments with artificial inoculation are carried out to evaluate the performance of genotypes and varieties. The phenotyping process in breeding trials requires constant monitoring and scoring by skilled experts. This work is time demanding and shows bias and heterogeneity according to the experience and capacity of each individual person. Optical sensors and artificial intelligence have demonstrated a great potential to achieve higher accuracy than human raters and the possibility to standardize phenotyping applications. A workflow combining red-green-blue (RGB) and multispectral imagery coupled to an unmanned aerial vehicle (UAV), and machine learning techniques was applied to score diseased plants and plots affected by RCRR. Georeferenced annotation of UAV orthorectified images. With the annotated images, five convolutional neural networks were trained to score individual plants. The training was carried out with different image analysis strategies and data augmentation, respectively. The custom convolutional neural network trained from scratch together with a pre-trained MobileNet showed the best precision in scoring RCRR (0.73 to 0.85). The average per plot of spectral information was used to score plots, and the benefit of adding the information obtained from the score of individual plants was compared. For this purpose, machine learning models were trained together with data management strategies, and the best-performing model was chosen. A combined pipeline of Random Forest and k-Nearest neighbors have shown the best weighted precision (0.67). This research provides a reliable workflow for detecting and scoring RCRR based on aerial imagery. RCRR is often distributed heterogeneously in trial plots, therefore, considering the information from individual plants of the plots showed a significant improvement of UAV based automated monitoring routines.
  • Publication
    Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models
    ( 2023-08-22) ;
    Berger, Armin
    ;
    ;
    Dilmaghani, Tim
    ;
    Khaled, Mohamed
    ;
    Kliem, B.
    ;
    Loitz, Rüdiger
    ;
    ;
    Leonhard, David
    ;
    ;
    Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.
  • Publication
    How Does Knowledge Injection Help in Informed Machine Learning?
    Informed machine learning describes the injection of prior knowledge into learning systems. It can help to improve generalization, especially when training data is scarce. However, the field is so application-driven that general analyses about the effect of knowledge injection are rare. This makes it difficult to transfer existing approaches to new applications, or to estimate potential improvements. Therefore, in this paper, we present a framework for quantifying the value of prior knowledge in informed machine learning. Our main contributions are threefold. Firstly, we propose a set of relevant metrics for quantifying the benefits of knowledge injection, comprising in-distribution accuracy, out-of-distribution robustness, and knowledge conformity. We also introduce a metric that combines performance improvement and data reduction. Secondly, we present a theoretical framework that represents prior knowledge in a function space and relates it to data representations and a trained model. This suggests that the distances between knowledge and data influence potential model improvements. Thirdly, we perform a systematic experimental study with controllable toy problems. All in all, this helps to find general answers to the question how knowledge injection helps in informed machine learning.
  • Publication
    A New Aligned Simple German Corpus
    ( 2023-07)
    Toborek, Vanessa
    ;
    Busch, Moritz
    ;
    Boßert, Malte
    ;
    ;
    Welke, Pascal
    "Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German - German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by the F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license.
  • Publication
    Contradiction Detection in Financial Reports
    ( 2023-01-23) ; ;
    Pucknat, Lisa
    ;
    Jacob, Basil
    ;
    Dilmaghani, Tim
    ;
    Nourimand, Mahdis
    ;
    Kliem, Bernd
    ;
    Loitz, Rüdiger
    ;
    ;
    Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.
  • Publication
    Preface to the Special Issue on Pattern Recognition (DAGM GCPR 2021)
    ( 2023) ;
    Förstner, Wolfgang
    ;
    Gall, Juergen
    ;
    Möller, Michael
    ;
    Schwing, Alexander Gerhard
  • Publication
    An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning
    ( 2023)
    Müller, Sebastian
    ;
    Toborek, Vanessa
    ;
    ;
    Jakobs, Matthias
    ;
    ;
    Welke, Pascal
    The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners.
  • Publication
    Informed Machine Learning - A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems
    Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.
  • Publication
    KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents
    ( 2022-12) ;
    Ali, Syed Musharraf
    ;
    ;
    Nurchalifah, Desiana Dien
    ;
    Jacob, Basil
    ;
    ;
    We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F 1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.