• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A Clustering Backed Deep Learning Approach for Document Layout Analysis
 
  • Details
  • Full
Options
2020
Conference Paper
Title

A Clustering Backed Deep Learning Approach for Document Layout Analysis

Abstract
Large organizations generate documents and records on a daily basis, often to such an extent that processing them manually becomes unduly time consuming. Because of this, automated processing systems for documents are desirable, as they would reduce the time spent handling them. Unfortunately, documents are often not designed to be machine-readable, so parsing them is a difficult problem. Image segmentation techniques and deep-learning architectures have been proposed as a solution to this, but have difficulty retaining accuracy when page layouts are especially dense. This leads to the possibilities of data being duplicated, lost, or inaccurate during retrieval. We propose a way of refining these segmentations, using a clustering based approach that can be easily combined with existing rules based refinements. We show that on a financial document corpus of 2675 pages, when using DBSCAN, this method is capable of significantly increasing the accuracy of existing deep-learning methods for image segmentation. This improves the reliability of the results in the context of automatic document analysis.
Author(s)
Agombar, Rhys  
Lübbering, Max  
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
Sifa, Rafet  
Mainwork
Machine Learning and Knowledge Extraction. 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conferernce, CD-MAKE 202. Proceedings  
Conference
International Cross-Domain Conference (CD-MAKE) 2020  
International Conference on Availability, Reliability and Security (ARES) 2020  
Open Access
DOI
10.1007/978-3-030-57321-8_23
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS  
  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024