Print processing in Contentus: Restoration of digitized print media
One of the main goals of the Contentus use case was to manage and improve the technical quality of large digital multimedia collections in cultural heritage organizations. Generally, there are two causes for quality impairment of digitized multimedia items: errors during the digitization process and a poor condition of the analog original. While digitization errors may be corrected by re-digitization, any deterioration of analog materials can only be counteracted by digital restoration in post-processing after digitization. This article showcases a unique technique developed in Contentus to restore digitized hectograph archive documents that typically display yellowed paper and faded printing ink. The documents used in this restoration showcase belong to the archive of the Music Information Center the Association of Composers and Musicologists (MIZ) of the former German Democratic Republic (GDR), and were produced between 1960 and 1989. The hectography method was widely adopted in the GDR to copy documents at a large scale. The showcased restoration method enhances the readability of on-screen texts and, as shown by evaluation, lowers the error rate of optical character recognition. In turn, the latter improvement is expected to improve the automated extraction of semantic information entities like persons, places and organizations. The technology presented in this article is an example of how corpora consisting of visually impaired analog media can be prepared for semantic search applications based on automatic content indexing - another major goal of the use case Contentus.