Options
2015
Conference Paper
Title
Confidence measures for seamless skew and orientation detection in document images
Abstract
Document deskewing is a crucial pre-processing step in any document analysis system. In mass digitization projects, the ability to automatically assess the success of this step enables significant reductions in the amount of work performed by human operators. The current paper extends our generic skew and orientation detection algorithm with the ability to return well-founded confidence values for the detected rotation angles, which in turn may lie anywhere within the range of -180 to 180 degrees. Starting with an in-depth theoretical analysis of all common situations occurring in digitized document images, including large non-text regions as well as noise patterns, we derive a formula for the expected distribution of the global skew angle. Using the results of the analysis, we propose two generic confidence measures which are able to accurately reflect and cover all the aforementioned situations. Finally, the introduced confidence measures are tested on the UW-I data set comprising 979 heterogeneous document scans.
Keyword(s)
document image processing
UW-I data set
document analysis system
document deskewing
document image digitization
document preprocessing step
expected global skew angle distribution
generic confidence measures
generic skew algorithm
heterogeneous document scans
mass digitization projects
noise patterns
nontext regions
orientation detection algorithm
rotation angles
image edge detection
robustness