Hier finden Sie wissenschaftliche Publikationen aus den Fraunhofer-Instituten.

Confidence measures for seamless skew and orientation detection in document images

: Konya, Iuliu; Eickeler, Stefan; Brandt, Christian


Institute of Electrical and Electronics Engineers -IEEE-; International Association for Pattern Recognition -IAPR-:
13th IAPR International Conference on Document Analysis and Recognition, ICDAR 2015. Conference Proceedings. Vol.2 : Nancy, France, [relocated from Tunisia], 23 - 26 August 2015
Piscataway, NJ: IEEE, 2015
ISBN: 978-1-4799-1806-5
ISBN: 978-1-4799-1805-8
International Conference on Document Analysis and Recognition (ICDAR) <13, 2015, Nancy>
Fraunhofer IAIS ()
document image processing; UW-I data set; document analysis system; document deskewing; document image digitization; document preprocessing step; expected global skew angle distribution; generic confidence measures; generic skew algorithm; heterogeneous document scans; mass digitization projects; noise patterns; nontext regions; orientation detection algorithm; rotation angles; image edge detection; robustness

Document deskewing is a crucial pre-processing step in any document analysis system. In mass digitization projects, the ability to automatically assess the success of this step enables significant reductions in the amount of work performed by human operators. The current paper extends our generic skew and orientation detection algorithm with the ability to return well-founded confidence values for the detected rotation angles, which in turn may lie anywhere within the range of -180 to 180 degrees. Starting with an in-depth theoretical analysis of all common situations occurring in digitized document images, including large non-text regions as well as noise patterns, we derive a formula for the expected distribution of the global skew angle. Using the results of the analysis, we propose two generic confidence measures which are able to accurately reflect and cover all the aforementioned situations. Finally, the introduced confidence measures are tested on the UW-I data set comprising 979 heterogeneous document scans.