Now showing 1 - 10 of 41
No Thumbnail Available
Publication

Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models

2023-12 , Deußer, Tobias , Leonhard, David , Hillebrand, Lars Patrick , Berger, Armin , Khaled, Mohamed , Heiden, Sarah , Dilmaghani, Tim , Kliem, Bernd , Loitz, Rüdiger , Bauckhage, Christian , Sifa, Rafet

Correct identification and correction of contradictions and inconsistencies within financial reports constitute a fundamental component of the audit process. To streamline and automate this critical task, we introduce a novel approach leveraging large language models and an embedding-based paragraph clustering methodology. This paper assesses our approach across three distinct datasets, including two annotated datasets and one unannotated dataset, all within a zero-shot framework. Our findings reveal highly promising results that significantly enhance the effectiveness and efficiency of the auditing process, ultimately reducing the time required for a thorough and reliable financial report audit.

No Thumbnail Available
Publication

Informed Machine Learning - A Taxonomy and Survey of Integrating Prior Knowledge into Learning Systems

2023 , Rueden, Laura von , Mayer, Sebastian , Beckh, Katharina , Georgiev, Bogdan , Giesselbach, Sven , Heese, Raoul , Kirsch, Birgit , Walczak, Michal , Pfrommer, Julius , Pick, Annika , Ramamurthy, Rajkumar , Garcke, Jochen , Bauckhage, Christian , Schuecker, Jannis

Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning.

No Thumbnail Available
Publication

Sicherheit von Quantum Machine Learning

2022-03-24 , Sultanow, Eldar , Bauckhage, Christian , Knopf, Christian , Piatkowski, Nico

Cyberkriminalität bewegt laut Cybersecurity Ventures weltweit schon heute das meiste Geld. Werden Quantencomputer noch dazu beitragen oder die IT-Sicherheit erhöhen? Sie bieten neue Angriffsflächen und können „klassische“ Sicherheitsmechanismen brechen, aber auch die Verteidigung optimieren. Maschinelles Lernen (ML) wird dabei als Quantum Machine Learning (QML) eine wichtige Rolle spielen.

No Thumbnail Available
Publication

Towards Map-Based Validation of Semantic Segmentation Masks

2020 , Rüden, Laura von , Wirtz, Tim , Hueger, Fabian , Schneider, Jan David , Bauckhage, Christian

Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness. We propose to validate machine learning models for self-driving vehicles not only with given ground truth labels, but also with additional a-priori knowledge. In particular, we suggest to validate the drivable area in semantic segmentation masks using given street map data. We present first results, which indicate that prediction errors can be uncovered by map-based validation.

No Thumbnail Available
Publication

Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

2023-08-22 , Hillebrand, Lars Patrick , Berger, Armin , Deußer, Tobias , Dilmaghani, Tim , Khaled, Mohamed , Kliem, B. , Loitz, Rüdiger , Pielka, Maren , Leonhard, David , Bauckhage, Christian , Sifa, Rafet

Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.

No Thumbnail Available
Publication

KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents

2022-12 , Deußer, Tobias , Ali, Syed Musharraf , Hillebrand, Lars Patrick , Nurchalifah, Desiana Dien , Jacob, Basil , Bauckhage, Christian , Sifa, Rafet

We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F 1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.

No Thumbnail Available
Publication

Anonymization of German financial documents using neural network-based language models with contextual word representations

2022-03 , Biesner, David , Ramamurthy, Rajkumar , Loitz, Rüdiger , Lübbering, Max , Hillebrand, Lars Patrick , Ladi, Anna , Stenzel, Robin , Pielka, Maren , Bauckhage, Christian , Sifa, Rafet

The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.

No Thumbnail Available
Publication

Contradiction Detection in Financial Reports

2023-01-23 , Deußer, Tobias , Pielka, Maren , Pucknat, Lisa , Jacob, Basil , Dilmaghani, Tim , Nourimand, Mahdis , Kliem, Bernd , Loitz, Rüdiger , Bauckhage, Christian , Sifa, Rafet

Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.

No Thumbnail Available
Publication

Gradient Flows for L2 Support Vector Machine Training

2022-08-08 , Bauckhage, Christian , Schneider, Helen , Wulff, Benjamin , Sifa, Rafet

We explore the merits of training of support vector machines for binary classification by means of solving systems of ordinary differential equations. We thus assume a continuous time perspective on a machine learning problem which may be of interest for implementations on (re)emerging hardware platforms such as analog- or quantum computers.

No Thumbnail Available
Publication

Utilizing Representation Learning for Robust Text Classification Under Datasetshift

2021 , Lübbering, Max , Gebauer, Michael , Ramamurthy, Rajkumar , Pielka, Maren , Bauckhage, Christian , Sifa, Rafet

Within One-vs-Rest (OVR) classification, a classifier differentiates a single class of interest (COI) from the rest, i.e. any other class. By extending the scope of the rest class to corruptions (dataset shift), aspects of outlier detection gain relevancy. In this work, we show that adversarially trained autoencoders (ATA) representative of autoencoder-based outlier detection methods, yield tremendous robustness improvements over traditional neural network methods such as multi-layer perceptrons (MLP) and common ensemble methods, while maintaining a competitive classification performance. In contrast, our results also reveal that deep learning methods solely optimized for classification, tend to fail completely when exposed to dataset shift.