Options
2015
Conference Paper
Titel
Confidence measures for seamless skew and orientation detection in document images
Abstract
Document deskewing is a crucial pre-processing step in any document analysis system. In mass digitization projects, the ability to automatically assess the success of this step enables significant reductions in the amount of work performed by human operators. The current paper extends our generic skew and orientation detection algorithm with the ability to return well-founded confidence values for the detected rotation angles, which in turn may lie anywhere within the range of -180 to 180 degrees. Starting with an in-depth theoretical analysis of all common situations occurring in digitized document images, including large non-text regions as well as noise patterns, we derive a formula for the expected distribution of the global skew angle. Using the results of the analysis, we propose two generic confidence measures which are able to accurately reflect and cover all the aforementioned situations. Finally, the introduced confidence measures are tested on the UW-I data set comprising 979 heterogeneous document scans.
Tags
-
document image processing
-
UW-I data set
-
document analysis system
-
document deskewing
-
document image digitization
-
document preprocessing step
-
expected global skew angle distribution
-
generic confidence measures
-
generic skew algorithm
-
heterogeneous document scans
-
mass digitization projects
-
noise patterns
-
nontext regions
-
orientation detection algorithm
-
rotation angles
-
image edge detection
-
robustness