Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Character enhancement for historical newspapers printed using hot metal typesetting

: Konya, Iulio; Eickeler, Stefan; Seibert, Christoph

Postprint urn:nbn:de:0011-n-1926700 (757 KByte PDF)
MD5 Fingerprint: 3e943e89f6adf2733d7e4161b8793c30
© 2011 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Erstellt am: 26.1.2012

International Association for Pattern Recognition -IAPR-, Technical Committee on Graphics Recognition; International Association for Pattern Recognition -IAPR-, Technical Committee on Reading Systems:
International Conference on Document Analysis and Recognition, ICDAR 2011. Vol.2 : Beijing, China, 18 - 21 September 2011; proceedings
Piscataway/NJ: IEEE, 2011
ISBN: 978-0-7695-4520-2 (Online)
ISBN: 978-1-4577-1350-7 (Print)
International Conference on Document Analysis and Recognition (ICDAR) <11, 2011, Beijing>
Konferenzbeitrag, Elektronische Publikation
Fraunhofer IAIS ()
OCR; retro-digitization; historical documents; hot metal typesetting

We propose a new method for an effective removal of the printing artifacts occurring in historical newspapers which are caused by problems in the hot metal typesetting, a widely used printing technique in the late 19th and early 20th century. Such artifacts typically appear as thin lines between single characters or glyphs and are in most cases connected to one of the neighboring characters. The quality of the optical character recognition (OCR) is heavily influenced by this type of printing artifacts. The proposed method is based on the detection of (near) vertical segments by means of directional single-connected chains (DSCC). In order to allow the robust processing of complex decorative fonts such as Fraktur, a set of rules is introduced. This allows us to successfully process prints e xhibiting artifacts with a stroke width even higher than that of most thin characters systems. We evaluate our approach on a dataset consisting of old newspaper excerpts printed using Fraktur fonts. The recognition results on the enhanced images using two independent OCR engines (ABBYY FineReader and Tesseract) show significant improvements over the originals.