Logical structure recognition for heterogeneous periodical collections
This work introduces a practical method for performing logical layout analysis on heterogeneous periodical collections. The described module is incorporated into the Fraunhofer document image understanding system and has been successfully used as part of mass digitization projects on more than 500 000 scanned pages. Our primary target are documents with complex layouts such as newspapers, however the described methods can easily be adapted to non-periodical publications. While encouraging, experimental results obtained on a heterogeneous set of digitized newspaper and chronicle pages spanning about 70 years reflect the high complexity of the generic, automated layout analysis problem. Our results allow the identification of promising areas for future investigation and provide a baseline for current in-the-wild document logical structure recognition.