Bauckhage, Christian

Prof. Dr.-Ing.

Bauckhage, Christian

0000-0001-6615-2128

Now showing 1 - 10 of 82

Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models

( 2023-12)
Deußer, Tobias
;
Leonhard, David
;
Hillebrand, Lars Patrick
;
Berger, Armin
;
Khaled, Mohamed
;
Heiden, Sarah
;
Dilmaghani, Tim
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

Correct identification and correction of contradictions and inconsistencies within financial reports constitute a fundamental component of the audit process. To streamline and automate this critical task, we introduce a novel approach leveraging large language models and an embedding-based paragraph clustering methodology. This paper assesses our approach across three distinct datasets, including two annotated datasets and one unannotated dataset, all within a zero-shot framework. Our findings reveal highly promising results that significantly enhance the effectiveness and efficiency of the auditing process, ultimately reducing the time required for a thorough and reliable financial report audit.
Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

( 2023-08-22)
Hillebrand, Lars Patrick
;
Berger, Armin
;
Deußer, Tobias
;
Dilmaghani, Tim
;
Khaled, Mohamed
;
Kliem, B.
;
Loitz, Rüdiger
;
Pielka, Maren
;
Leonhard, David
;
Bauckhage, Christian
;
Sifa, Rafet

Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.
Contradiction Detection in Financial Reports

( 2023-01-23)
Deußer, Tobias
;
Pielka, Maren
;
Pucknat, Lisa
;
Jacob, Basil
;
Dilmaghani, Tim
;
Nourimand, Mahdis
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.
Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

( 2023)
Berger, Armin
;
Hillebrand, Lars Patrick
;
Leonhard, David
;
Deußer, Tobias
;
Bell Felix de Oliveira, Thiago
;
Dilmaghani, Tim
;
Khaled, Mohamed
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI's GPT models. This comparative analysis leverages two custom datasets provided by our partner PricewaterhouseCoopers (PwC) Germany. We find that the open-source Llama-2 70 billion model demonstrates outstanding performance in detecting non-compliance or true negative occurrences, beating all their proprietary counterparts. Nevertheless, proprietary models such as GPT-4 perform the best in a broad variety of scenarios, particularly in non-English contexts.
KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents

( 2022-12)
Deußer, Tobias
;
Ali, Syed Musharraf
;
Hillebrand, Lars Patrick
;
Nurchalifah, Desiana Dien
;
Jacob, Basil
;
Bauckhage, Christian
;
Sifa, Rafet

We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F 1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.
From Open Set Recognition Towards Robust Multi-class Classification

( 2022-09-15)
Lübbering, Max
;
Gebauer, Michael
;
Ramamurthy, Rajkumar
;
Bauckhage, Christian
;
Sifa, Rafet

The challenges and risks of deploying deep neural networks (DNNs) in the open-world are often overlooked and potentially result in severe outcomes. With our proposed informer approach, we leverage autoencoder-based outlier detectors with their sensitivity to epistemic uncertainty by ensembling multiple detectors each learning a different one-vs-rest setting. Our results clearly show informer’s superiority compared to DNN ensembles, kernel-based DNNs, and traditional multi-layer perceptrons (MLPs) in terms of robustness to outliers and dataset shift while maintaining a competitive classification performance. Finally, we show that informer can estimate the overall uncertainty within a prediction and, in contrast to any of the other baselines, break the uncertainty estimate down into aleatoric and epistemic uncertainty. This is an essential feature in many use cases, as the underlying reasons for the uncertainty are fundamentally different and can require different actions.
Towards Generating Financial Reports from Tabular Data Using Transformers

( 2022-08-11)
Chapman, Clayton
;
Hillebrand, Lars Patrick
;
Stenzel, Marc Robin
;
Deußer, Tobias
;
Biesner, David
;
Bauckhage, Christian
;
Sifa, Rafet

Financial reports are commonplace in the business world, but are long and tedious to produce. These reports mostly consist of tables with written sections describing these tables. Automating the process of creating these reports, even partially has the potential to save a company time and resources that could be spent on more creative tasks. Some software exists which uses conditional statements and sentence templates to generate the written sections. This solution lacks creativity and innovation when compared to recent advancements in NLP and deep learning. We instead implement a transformer network to solve the task of generating this text. By generating matching pairs between tables and sentences found in financial documents, we created a dataset for our transformer. We were able to achieve promising results, with the final model reaching a BLEU score of 63.3. Generated sentences are natural, grammatically correct and mostly faithful to the information found in the tables.
Gradient Flows for L2 Support Vector Machine Training

( 2022-08-08)
Bauckhage, Christian
;
Schneider, Helen
;
Wulff, Benjamin
;
Sifa, Rafet

We explore the merits of training of support vector machines for binary classification by means of solving systems of ordinary differential equations. We thus assume a continuous time perspective on a machine learning problem which may be of interest for implementations on (re)emerging hardware platforms such as analog- or quantum computers.
Anonymization of German financial documents using neural network-based language models with contextual word representations

( 2022-03)
Biesner, David
;
Ramamurthy, Rajkumar
;
Loitz, Rüdiger
;
Lübbering, Max
;
Hillebrand, Lars Patrick
;
Ladi, Anna
;
Stenzel, Robin
;
Pielka, Maren
;
Bauckhage, Christian
;
Sifa, Rafet

The automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.
Towards automating Numerical Consistency Checks in Financial Reports

( 2022)
Hillebrand, Lars Patrick
;
Deußer, Tobias
;
Dilmaghani, Tim
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue"or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences before linking them to synonymous occurrences in the balance sheet and profit & loss statement. The tool achieves a high matching performance of 73.00% micro F1 on a hold out test set and is currently being deployed for a globally operating major auditing firm to assist the auditing procedure of financial statements.

Bauckhage, Christian

Filters

Author

Organization

Subject

Has files

Type

Settings

Sort By

Results per page

Options

Bauckhage, Christian

Filters

Author

Organization

Subject

Has files

Type

Settings

Sort By

Results per page