Options
2013
Conference Paper
Title
Automated generation of high-quality training data for appearance-based object models
Abstract
Methods for automated person detection and person tracking are essential core components in modern security and surveillance systems. Most state-of-the-art person detectors follow a statistical approach, where prototypical appearances of persons are learned from training samples with known class labels. Selecting appropriate learning samples has a significant impact on the quality of the generated person detectors. For example, training a classifier on a rigid body model using training samples with strong pose variations is in general not effective, irrespective of the classifiers capabilities. Generation of high-quality training data is, apart from performance issues, a very time consuming process, comprising a significant amount of manual work. Furthermore, due to inevitable limitations of freely available training data, corresponding classifiers are not always transferable to a given sensor and are only applicable in a well-defined narrow variety of scenes and camera setups. Semi-supervised learning methods are a commonly used alternative to supervised training, in general requiring only few labeled samples. However, as a drawback semi-supervised methods always include a generative component, which is known to be difficult to learn. Therefore, automated processes for generating training data sets for supervised methods are needed. Such approaches could either help to better adjust classifiers to respective hardware, or serve as a complement to existing data sets. Towards this end, this paper provides some insights into the quality requirements of automatically generated training data for supervised learning methods. Assuming a static camera, labels are generated based on motion detection by background subtraction with respect to weak constraints on the enclosing bounding box of the motion blobs. Since this labeling method consists of standard components, we illustrate the effectiveness by adapting a person detector to cameras of a sensor network. While varying the training data and keeping the detection framework identical, we derive statements about the sample quality.