• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Artikel
  4. Flexible Hybrid Table Recognition and Semantic Interpretation System
 
  • Details
  • Full
Options
May 4, 2023
Journal Article
Title

Flexible Hybrid Table Recognition and Semantic Interpretation System

Abstract
Extracting information from documents containing quantitative data in tabular format is an important but still unsolved task due to the heterogeneity of document layouts. This work aims to take a step toward developing a solution to this problem. This paper proposes a flexible, hybrid table extraction system consisting of a deep learning-based table detection module, a heuristic-based structure recognition method, and a graph-based semantic interpretation component. The proposed system is modular and supports the most frequent table layouts. Moreover, it handles both the documents in image format and PDF files with embedded text. The proposed system outperforms the baseline method and achieves results on par with state-of-the-art approaches on the challenging benchmarks from ICDAR 2013 and ICDAR 2019 table interpretation competitions. Moreover, we correct an issue with the evaluation script used in the latter competition and report extended results of the proposed method in comparison with a leading commercial product. Finally, our table extraction system achieves a high F1 score in the scenario where raw documents are given as input and the targeted information is contained in a subset of table columns. The presented system achieves results competitive with leading methods in the field. It has already been evaluated on general-purpose data and biomedical benchmarks. We intend to continuously improve our approach and process data from other domains, e.g., financial documents. To support future research on information extraction from documents, we make the evaluation scripts and results from our experiments publicly available at https://github.com/mnamysl/tabrec-sncs.
Author(s)
Namysl, Marcin  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Esser, Alexander M.
Behnke, Sven  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Köhler, Joachim  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Journal
SN Computer Science  
Open Access
DOI
10.1007/s42979-022-01659-z
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Keyword(s)
  • Information extraction

  • Document understanding

  • Table detection

  • Table structure recognition

  • Table interpretation

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024