• English
  • Deutsch
  • Log In
    Password Login
    Research Outputs
    Fundings & Projects
    Researchers
    Institutes
    Statistics
Repository logo
Fraunhofer-Gesellschaft
  1. Home
  2. Fraunhofer-Gesellschaft
  3. Konferenzschrift
  4. A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character Recognition
 
  • Details
  • Full
Options
2022
Conference Paper
Title

A Digitization Pipeline for Mixed-Typed Documents Using Machine Learning and Optical Character Recognition

Abstract
Although digitization is advancing rapidly, a large amount of data processed by companies is in printed format. Technologies such as Optical Character Recognition (OCR) support the transformation of printed text into machine-readable content. However, OCR struggles when data on documents is highly unstructured and includes non-text objects. This, e.g., applies to documents such as medical prescriptions. Leveraging Design Science Research (DSR), we propose a flexible processing pipeline that can deal with character recognition on the one hand and object detection on the other hand. To do so, we derive Design Requirements (DR) in cooperation with a practitioner doing prescription billing in the healthcare domain. We then developed a prototype blueprint that is applicable to similar problem formulations. Overall, we contribute to research and practice in multiple ways. First, we provide evidence for selected OCR methods provided by previous research. Second, we design a machine-learning-based digitization pipeline for printed documents containing both text and non-text objects in the context of medical prescriptions. Third, we derive a nascent design pattern for this type of document digitization. These patterns are the foundation for further research and can support the development of innovative information systems leading to more efficient decision making and thus to economic resource usage.
Author(s)
Matschak, Tizian
Rampold, Florian
Hellmeier, Malte  orcid-logo
Fraunhofer-Institut für Software- und Systemtechnik ISST  
Prinz, Christoph
Trang, Simon
Mainwork
The Transdisciplinary Reach of Design Science Research  
Conference
International Conference on Design Science Research in Information Systems and Technology 2022  
Open Access
DOI
10.1007/978-3-031-06516-3_15
Additional full text version
Landing Page
Language
English
Fraunhofer-Institut für Software- und Systemtechnik ISST  
Keyword(s)
  • Document image analysis

  • Optical character recognition

  • Digitization

  • Machine learning

  • Preprocessing

  • Postprocessing

  • Cookie settings
  • Imprint
  • Privacy policy
  • Api
  • Contact
© 2024