Bauckhage, Christian

Prof. Dr.-Ing.

Bauckhage, Christian

0000-0001-6615-2128

Now showing 1 - 7 of 7

Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models

( 2023-12)
Deußer, Tobias
;
Leonhard, David
;
Hillebrand, Lars Patrick
;
Berger, Armin
;
Khaled, Mohamed
;
Heiden, Sarah
;
Dilmaghani, Tim
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

Correct identification and correction of contradictions and inconsistencies within financial reports constitute a fundamental component of the audit process. To streamline and automate this critical task, we introduce a novel approach leveraging large language models and an embedding-based paragraph clustering methodology. This paper assesses our approach across three distinct datasets, including two annotated datasets and one unannotated dataset, all within a zero-shot framework. Our findings reveal highly promising results that significantly enhance the effectiveness and efficiency of the auditing process, ultimately reducing the time required for a thorough and reliable financial report audit.
Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

( 2023-08-22)
Hillebrand, Lars Patrick
;
Berger, Armin
;
Deußer, Tobias
;
Dilmaghani, Tim
;
Khaled, Mohamed
;
Kliem, B.
;
Loitz, Rüdiger
;
Pielka, Maren
;
Leonhard, David
;
Bauckhage, Christian
;
Sifa, Rafet

Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches.
Contradiction Detection in Financial Reports

( 2023-01-23)
Deußer, Tobias
;
Pielka, Maren
;
Pucknat, Lisa
;
Jacob, Basil
;
Dilmaghani, Tim
;
Nourimand, Mahdis
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.
Towards Automated Regulatory Compliance Verification in Financial Auditing with Large Language Models

( 2023)
Berger, Armin
;
Hillebrand, Lars Patrick
;
Leonhard, David
;
Deußer, Tobias
;
Bell Felix de Oliveira, Thiago
;
Dilmaghani, Tim
;
Khaled, Mohamed
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from financial reports to align with the legal requirements of accounting standards. However, a glaring limitation remains: these systems commonly fall short in verifying if the recommended excerpts indeed comply with the specific legal mandates. Hence, in this paper, we probe the efficiency of publicly available Large Language Models (LLMs) in the realm of regulatory compliance across different model configurations. We place particular emphasis on comparing cutting-edge open-source LLMs, such as Llama-2, with their proprietary counterparts like OpenAI's GPT models. This comparative analysis leverages two custom datasets provided by our partner PricewaterhouseCoopers (PwC) Germany. We find that the open-source Llama-2 70 billion model demonstrates outstanding performance in detecting non-compliance or true negative occurrences, beating all their proprietary counterparts. Nevertheless, proprietary models such as GPT-4 perform the best in a broad variety of scenarios, particularly in non-English contexts.
KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports

( 2022)
Hillebrand, Lars Patrick
;
Deußer, Tobias
;
Dilmaghani, Tim
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue"or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Transformers (BERT) combining a recurrent neural network (RNN) with conditional label masking to sequentially tag entities before it classifies their relations. Our model also introduces a learnable RNN-based pooling mechanism and incorporates domain expert knowledge by explicitly filtering impossible relations. We achieve a substantially higher prediction performance on a new practical dataset of German financial reports, outperforming several strong baselines including a competing state-of-the-art span-based entity tagging approach.
Towards automating Numerical Consistency Checks in Financial Reports

( 2022)
Hillebrand, Lars Patrick
;
Deußer, Tobias
;
Dilmaghani, Tim
;
Kliem, Bernd
;
Loitz, Rüdiger
;
Bauckhage, Christian
;
Sifa, Rafet

We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue"or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences before linking them to synonymous occurrences in the balance sheet and profit & loss statement. The tool achieves a high matching performance of 73.00% micro F1 on a hold out test set and is currently being deployed for a globally operating major auditing firm to assist the auditing procedure of financial statements.
Automatic Indexing of Financial Documents via Information Extraction

( 2021)
Ramamurthy, Rajkumar
;
Lübbering, Max
;
Bell , Thiago
;
Gebauer, Michael
;
Ulusay, Bilge
;
Uedelhoven, Daniel
;
Dilmaghani, Tim
;
Loitz, Rüdiger
;
Pielka, Maren
;
Bauckhage, Christian
;
Sifa, Rafet

The problem of extracting information from large volumes of unstructured documents is pervasive in the domain of financial business. Enterprises and investors need automatic methods that can extract information from these documents, particularly for indexing and efficiently retrieving information. To this end, we present a scalable end-to-end document processing system for indexing and information retrieval from large volumes of financial documents. While we show our system works for the use case of financial document processing, the entire system itself is agnostic of the document type and machine learning model type. Thus, it can be applied to any large-scale document processing task involving domain-specific extractors.

Bauckhage, Christian

Filters

Author

Organization

Subject

Has files

Type

Settings

Sort By

Results per page

Options

Bauckhage, Christian

Filters

Author

Organization

Subject

Has files

Type

Settings

Sort By

Results per page