Automatic Indexing of Financial Documents via Information Extraction

Ramamurthy, Rajkumar; Lübbering, Max; Bell , Thiago; Gebauer, Michael; Ulusay, Bilge; Uedelhoven, Daniel; Dilmaghani, Tim; Loitz, Rüdiger; Pielka, Maren; Bauckhage, Christian; Sifa, Rafet

doi:10.1109/SSCI50451.2021.9659977

2021

Conference Paper

Abstract

The problem of extracting information from large volumes of unstructured documents is pervasive in the domain of financial business. Enterprises and investors need automatic methods that can extract information from these documents, particularly for indexing and efficiently retrieving information. To this end, we present a scalable end-to-end document processing system for indexing and information retrieval from large volumes of financial documents. While we show our system works for the use case of financial document processing, the entire system itself is agnostic of the document type and machine learning model type. Thus, it can be applied to any large-scale document processing task involving domain-specific extractors.