Now showing 1 - 8 of 8
  • Publication
    Uncovering Inconsistencies and Contradictions in Financial Reports using Large Language Models
    ( 2023-12) ;
    Leonhard, David
    ;
    ;
    Berger, Armin
    ;
    Khaled, Mohamed
    ;
    Heiden, Sarah
    ;
    Dilmaghani, Tim
    ;
    Kliem, Bernd
    ;
    Loitz, Rüdiger
    ;
    ;
    Correct identification and correction of contradictions and inconsistencies within financial reports constitute a fundamental component of the audit process. To streamline and automate this critical task, we introduce a novel approach leveraging large language models and an embedding-based paragraph clustering methodology. This paper assesses our approach across three distinct datasets, including two annotated datasets and one unannotated dataset, all within a zero-shot framework. Our findings reveal highly promising results that significantly enhance the effectiveness and efficiency of the auditing process, ultimately reducing the time required for a thorough and reliable financial report audit.
  • Publication
    Contradiction Detection in Financial Reports
    ( 2023-01-23) ; ;
    Pucknat, Lisa
    ;
    Jacob, Basil
    ;
    Dilmaghani, Tim
    ;
    Nourimand, Mahdis
    ;
    Kliem, Bernd
    ;
    Loitz, Rüdiger
    ;
    ;
    Finding and amending contradictions in a financial report is crucial for the publishing company and its financial auditors. To automate this process, we introduce a novel approach that incorporates informed pre-training into its transformer-based architecture to infuse this model with additional Part-Of-Speech knowledge. Furthermore, we fine-tune the model on the public Stanford Natural Language Inference Corpus and our proprietary financial contradiction dataset. It achieves an exceptional contradiction detection F1 score of 89.55% on our real-world financial contradiction dataset, beating our several baselines by a considerable margin. During the model selection process we also test various financial-document-specific transformer models and find that they underperform the more general embedding approaches.
  • Publication
    KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents
    ( 2022-12) ;
    Ali, Syed Musharraf
    ;
    ;
    Nurchalifah, Desiana Dien
    ;
    Jacob, Basil
    ;
    ;
    We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F 1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.
  • Publication
    Towards automating Numerical Consistency Checks in Financial Reports
    ( 2022) ; ;
    Dilmaghani, Tim
    ;
    Kliem, Bernd
    ;
    Loitz, Rüdiger
    ;
    ;
    We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue"or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences before linking them to synonymous occurrences in the balance sheet and profit & loss statement. The tool achieves a high matching performance of 73.00% micro F1 on a hold out test set and is currently being deployed for a globally operating major auditing firm to assist the auditing procedure of financial statements.
  • Publication
    Automatic Indexing of Financial Documents via Information Extraction
    ( 2021) ; ;
    Bell , Thiago
    ;
    Gebauer, Michael
    ;
    Ulusay, Bilge
    ;
    Uedelhoven, Daniel
    ;
    Dilmaghani, Tim
    ;
    Loitz, Rüdiger
    ;
    ; ;
    The problem of extracting information from large volumes of unstructured documents is pervasive in the domain of financial business. Enterprises and investors need automatic methods that can extract information from these documents, particularly for indexing and efficiently retrieving information. To this end, we present a scalable end-to-end document processing system for indexing and information retrieval from large volumes of financial documents. While we show our system works for the use case of financial document processing, the entire system itself is agnostic of the document type and machine learning model type. Thus, it can be applied to any large-scale document processing task involving domain-specific extractors.
  • Publication
    Tackling Contradiction Detection in German Using Machine Translation and End-to-End Recurrent Neural Networks
    Natural Language Inference, and specifically Contradiction Detection, is still an unexplored topic with respect to German text. In this paper, we apply Recurrent Neural Network (RNN) methods to learn contradiction-specific sentence embeddings. Our data set for evaluation is a machine-translated version of the Stanford Natural Language Inference (SNLI) corpus. The results are compared to a baseline using unsupervised vectorization techniques, namely tf-idf and Flair, as well as state-of-the art transformer-based (MBERT) methods. We find that the end-to-end models outperform the models trained on unsupervised embeddings, which makes them the better choice in an empirical use case. The RNN methods also perform superior to MBERT on the translated data set.
  • Publication
    RatVec: A General Approach for Low-dimensional Distributed Vector Representations via Rational Kernels
    ( 2019)
    Brito, Eduardo
    ;
    ;
    Domingo-Fernández, Daniel
    ;
    Hoyt, Charles Tapley
    ;
    We present a general framework, RatVec, for learning vector representations of non-numeric entities based on domain-specific similarity functions interpreted as rational kernels. We show competitive performance using k-nearest neighbors in the protein family classification task and in Dutch spelling correction. To promote re-usability and extensibility, we have made our code and pre-trained models available athttps://github.com/ratvec.
  • Publication
    Towards Contradiction Detection in German: A Translation-Driven Approach
    With the recent advancements in Machine Learning based Natural Language Processing (NLP), language dependency has always been a limiting factor for a majority of NLP applications. Typically, models are trained for the English language due to the availability of very large labeled and unlabeled datasets, which also allow to fine tune models for that language. Contradiction Detection is one such problem that has found many practical applications in NLP and up to this point has only been studied in the context of English language. The scope of this paper is to examine a set of baseline methods for the Contradiction Detection task on German text. For this purpose, the well-known Stanford Natural Language Inference (SNLI) data set (110,000 sentence pairs) is machine-translated from English to German. We train and evaluate four classifiers on both the original and the translated data, using state-of-the-art textual data representations. Our main contribution is the first large-scale assessment for this problem in German, and a validation of machine translation as a data generation method. We also present a novel approach to learn sentence embeddings by exploiting the hidden states of an encoder-decoder Sequence-To-Sequence RNN trained for autoencoding or translation.